Python Program Source Code Protection and Commercial Security Practices

1. Introduction

Purpose

For a long time, Python-developed programs have faced challenges in commercial distribution. This primarily stems from Python’s runtime mechanism: as an interpreted language, to ensure program compatibility in the CPython environment, it is usually necessary to provide the source code or bytecode to the user. However, this makes the source code easily accessible to users, potentially leading to damage to commercial interests. This article will introduce how to use tools such as Nuitka, Ghidra, and Enigma Protector in combination to effectively protect Python programs while maintaining full CPython compatibility, meeting the needs of commercial distribution.

Background and Environment

This method is only applicable to Windows systems, as Windows is the mainstream operating system used by current clients. Implementing this protection scheme requires the following environment and tools:

CPython 3+
Nuitka (Used to compile Python source code into native executable files)
Ghidra (Used for static analysis and locating key functions)
Enigma Protector or other tools with virtual machine protection capabilities (such as VMProtect)

This method is suitable for all Python software released on the Windows platform. Whether it’s a small project by an individual developer or a large-scale enterprise application, this scheme can be used for code protection.

Benefits / Motivation

Perfect CPython Compatibility: Ensures the program runs normally in the standard Python runtime environment.
Performance Improvement: Through compilation optimization, program execution efficiency is higher.
Protect Source Code and Commercial Interests: Prevents source code leakage and maintains intellectual property rights.
Enhanced Security: Improves the program’s resistance to reverse engineering and decompilation through virtual machine protection and static analysis techniques.

2. The Essential Logic of PyInstaller and Verification

When it comes to packaging Python programs, many people first think of PyInstaller. Indeed, PyInstaller can package Python programs for distribution to a certain extent. However, for commercial distribution scenarios, PyInstaller has obvious limitations for the following reasons:

Overview of PyInstaller’s Working Principle

Packaging Stage Process

Dependency Analysis: Parses the import statements of the entry script and recursively collects all dependencies.
Bytecode Compilation: Compiles Python source code into .pyc bytecode files.
Resource Packaging: Packages bytecode, extension libraries, and resource files into an archive.
Generate Executable File: Merges the bootloader, Python interpreter, and archive into an executable (exe).

Why This Is Not True “Compilation”

Feature	Traditional Compilation (C/C++)	PyInstaller “Packaging”
Code Conversion	Source Code → Machine Code	Source Code → Bytecode (Intermediate Form)
Reversibility	Basically Irreversible	Completely Reversible
Runtime Form	Executes Machine Code Directly	Requires Interpreter to Execute Bytecode
Protection Strength	Reverse Engineering at Assembly Level is Difficult	Source Code Level Restoration

Technical Root Causes of Easy Extraction

1. Bytecode Standardization Problem

# The bytecode stored by PyInstaller is identical to standard Python.
# Example: Decompilation Comparison
Original file: main.py → Compiled to → main.pyc
PyInstaller: main.py → Same compiler → Same format main.pyc

# This means all Python decompilation tools can process it directly.

2. Memory Loading Mechanism Defect

import sys
if hasattr(sys, '_MEIPASS'):
    # At runtime, all modules are extracted to memory or disk.
    # Attackers can easily obtain the complete code via memory dump.
    print(f"All code files located at: {sys._MEIPASS}")

3. Lack of Effective Protection Layer

No Encryption Protection: Bytecode is stored directly, without encryption measures.
No Code Obfuscation: Variable names and function names remain intact.
No Runtime Protection: The interpreter loads bytecode in the standard way.

Proof/Example Illustration

Bytecode fully exposed after unpacking a PyInstaller bundle.
PyInstaller program revealing the location of all code files.

3. Why Not Choose PyArmor

At this point, some might suggest using PyArmor for Python code protection. However, PyArmor itself has some significant limitations:

PyArmor Function Overview

Encrypts Python scripts and provides license control.
Generates executable files with runtime protection (by bundling the Python interpreter).
Supports license management, expiration control, hardware binding, and other authorization features.
Suitable for simple code protection needs and rapid distribution of small projects.

Limitations and Reasons for Not Suiting This Scheme

1. Compatibility Issues

Platform Dependency: PyArmor encrypted scripts depend on platform-specific dynamic libraries (_pytransform), which are not portable across different architectures (e.g., x86_64 vs. aarch64).
Python Version Restrictions: Advanced encryption features (like RFT, BCC modes) have requirements for Python versions, typically needing Python 3.7+.
Module Compatibility: Some Python modules using C extensions might not run correctly.

2. Security Limitations

Limited Encryption Strength: PyArmor’s encryption algorithms and runtime protection are relatively easy to bypass via reverse engineering.
Memory Exposure Risk: Decrypted bytecode remains visible in memory and can be obtained via memory dumping.
Protection Can Be Bypassed: Experienced attackers can bypass the protection mechanisms using debuggers or specific tools.

3. Performance Impact

Startup Delay: Encrypted scripts have an initialization overhead of about 40ms, which is significant for short-running command-line tools.
Runtime Overhead: Executing encrypted functions incurs additional time cost (approx. 0.002ms per thousand bytecode instructions).
Resource Usage: Advanced encryption modes increase memory consumption.

4. Poor Perfect CPython Compatibility

Interpreter Behavior Alteration: PyArmor’s wrapping method may affect the standard behavior of the native interpreter.
Debugging Difficulty: Encrypted code is hard to debug using standard Python debugging tools.
Complex Dependency Management: Handling third-party dependencies in an encrypted environment may encounter unexpected issues.

Comparative Reference

Feature	PyInstaller	PyArmor	Nuitka + VM Protection
Distribution Method	Packaged Bytecode + Interpreter	Encrypted Bytecode + Runtime Protection	Compiled to Native Executable + VM Protection
CPython Compatibility	Full	Medium (with platform/version limits)	Very High
Source Code Protection Strength	Low (Bytecode directly exposed)	Medium (Encrypted but bypassable)	High (Native code + VM protection)
Reverse Engineering Difficulty	Low (Standard decompilation tools)	Medium (Requires specific decryption tools)	High (Requires reverse engineering skills)
Performance Impact	Similar to interpreter	Has initialization and runtime overhead	Native code performance, significant improvement
Applicable Scenarios	Internal tools, demo versions	Small/medium projects, simple protection needs	Commercial software, high-security requirement projects

4. The Nature of Nuitka and Correct Usage Methods

What is Nuitka

Nuitka is a Python compiler that converts Python code into C/C++ code, which is then compiled into native executable files.
The compiled program can run like a regular native program, without relying on source code or bytecode.
It is compatible with CPython syntax and standard libraries, supporting most Python features and third-party modules.

Why Use Nuitka

Source Code Protection: Generates native executables; source code is not directly exposed, increasing reverse engineering difficulty.
Performance Boost: Compiled via C/C++, program execution speed is typically higher than interpreted execution.
Full CPython Compatibility: Behavior is consistent with the native Python environment, preserving original logic and APIs.
Supports Further Protection: Can be combined with virtual machine protection tools (e.g., Enigma Protector) to add a security layer.
Cross-Platform Potential: Although this tutorial targets Windows, Nuitka itself supports multi-platform compilation.

nuitka

How to Correctly Invoke Nuitka (Command and Options Placeholder)

In most cases, the Nuitka command format is as follows:

nuitka --standalone --output-dir=output --windows-console-mode=disable/force/attach/hide file.py

⚠️ Note: Do not use the --onefile option in this process.
This option packages all files into a single self-extracting executable, making subsequent static analysis and function location operations impossible.

5. Why Static Analysis is Needed to Find Python main, and the Nature of Virtual Machine Protection

Why Compilation Alone is Not Enough

Although Nuitka can compile Python code into C/C++ and generate native executables, these executables can still be statically analyzed using reverse engineering tools (like Ghidra, IDA Pro).
Attackers can easily find key Python C-API calls or module initialization logic, thereby inferring the program’s core functionality.
Therefore, relying solely on compilation does not provide sufficient protection; further protection for key logic is still needed.

Why Find Python main

In the output generated by Nuitka, the main function is typically the entry point for Python interpreter initialization, module loading, and main script execution.
Protecting main and its nearby key functions can maximize coverage of most of the program’s business logic.
Locating the address of this function through static analysis and adding it to the protection can significantly increase the difficulty of reverse engineering.

The Nature of Virtual Machine (VM) Protection

Virtual Machine Protection (VM-based Protection) obfuscates and protects by converting the machine code of specified functions into virtual machine bytecode and inserting a runtime interpreter.
In Enigma Protector, this is done by specifying functions using EP_MARKER_BEGIN / EP_MARKER_END or through the visual interface.
The protected code is converted into VM instructions. Even if attackers disassemble it, they only see complex VM handler call flows instead of the original logic.
Advantages:
- Significantly increases the cost of static analysis and debugging.
- Prevents direct modification of key logic.
Disadvantages:
- Protecting too much code can lead to performance degradation.
- The protection scope must be chosen carefully to avoid protecting system calls or dynamic loading logic, which could cause program abnormalities.

Unable to analyze the encrypted running program.
Analyzing with Ghidra after encryption

Protection Boundary Conditions and Precautions

Cannot Insert VM Markers Directly at the Python Layer: Must operate on the final PE/EXE file after compilation is complete.
Bitness Match: The target protection tool (e.g., TEP) must match the compilation target bitness (32/64-bit).
Avoid Protecting Too Large a Range: Protecting the entire .text section may prevent the program from starting; priority should be given to protecting key logic functions (like main, initialization functions).
Debugging Assistance: It is recommended to first use Ghidra to locate main and confirm the protection range, then gradually test the protection configuration to avoid debugging difficulties caused by protecting too much code at once.

6. Hands-on: How to Locate the Python main in Nuitka Output Using Ghidra (Step Framework)

Prerequisite: Preparation

Download Ghidra from Github, run ghidraRun.bat.
Create a new project.

Step 1: Import and Preliminary Analysis

Import the executable file generated by Nuitka.
Drag main.exe into the code viewer.
Let Ghidra analyze the executable file automatically.

Step 2: Locate the Entry Point (Entry)

Find the entry function in the symbol tree on the left.

Step 3: Trace to CRT / Startup Wrapper Function

Start from the entry function and gradually search for the startup wrapper function until you find calls related to crt.
From this function, find a key function call. The first two parameters of this function should be approximately (&IMAGE_DOS_HEADER_140000000, 0). The first value is the module base address, the second parameter is generally nullptr.

Step 4: Identify Nuitka’s main

Click into the function, and everything becomes clear.

Step 5: Confirm and Extract Function Address (RVA/VA)

You can continue tracing deeper into main, but tracing to this point is sufficient.
Copy the function name, press ‘G’ in Ghidra, enter the function name. It will take you to the function’s location, where you can copy the function’s address.

7. Final Protection

Basic Workflow in Enigma Protector

Finally, in Enigma Protector or other tools, input the function address and generate the protected program.
You can also vitualize the 620 and c40 function for better protection.

By GreshAnt 2025