pyinstaller package exe to source code

Posted by michibk on Thu, 30 Dec 2021 01:19:28 +0100

pyc file (bytecode)

How to generate pyc files

https://www.cnblogs.com/zhangqunshi/p/6657208.html
https://blog.csdn.net/weixin_30614587/article/details/97230135

Party 1 command line
python -m example.py

Fang 2
import py_compile
py_compile.compile('example.py')

Party 3 will py All files are converted to pyc file
import compileall
compileall.compile_dir(r'/path')

Party 4 command line directory py All files are converted to pyc file
python -m compileall <dir>

But I found one after running the code directly__pycache__In the folder pyc file,This is the same as the results of these methods.

pyo File is pyc Simplified version of the document
python -o -m example.py
python -oo -m example.py
-O Parameters indicate that a more compact optimized bytecode is to be generated, The resulting bytecode file suffix is.pyo File.
-OO Will be further removed-O Option to generate the document string in the optimized bytecode file, and the generated file suffix is still.pyo File.

How to run pyc files

https://www.pynote.net/archives/2342

method
python  example.cpython-39.pyc

How to convert to py file

https://blog.csdn.net/duohuanxi/article/details/114799153
https://pypi.org/project/uncompyle6/

uncompyle6 -o example.py example.pyc

Error: uncompyle6 requires Python 2.6-3.8
 Mine is python3.9 You can't demonstrate

Bytecode and machine code

https://blog.csdn.net/qq_35810838/article/details/99294636

Bytecode and machine code are composed of binary files, i.e. 01, but when opened with notepad, letters will be found, which is the problem of notepad

How python works

Source code py file → \rightarrow → bytecode pyc file → \rightarrow → result, pyc file gets the result through the interpreter
python does not need to convert bytecode every time. Before conversion, the interpreter will judge whether the modification time of the code file is consistent with that of the bytecode pyc file after the last conversion. If not, it will be converted again. That is, the pyc file is generated after running once. After each run, only pyc is input into the interpreter to get the result

python translates the source code into bytecode. The bytecode obtains the results through the interpreter and does not generate machine code, but c language will generate machine code, that is, exe file

Get code from exe

https://reverseengineering.stackexchange.com/questions/160/how-do-you-reverse-engineer-an-exe-compiled-with-pyinstaller

The full name of PE file is Portable Executable, which means Portable Executable file. Common EXE, DLL, OCX, SYS and COM are PE files. PE file is a program file on Microsoft Windows operating system (it may be executed indirectly, such as DLL)

Unmanaged code decompiled is assembly language, and managed code decompiled is high-level language
Unmanaged code is programmed according to different systems and different CPUs to adapt to the cup and operating system
Managed code, such as C# generates an intermediate language through a compiler, but the CSCEC language still needs to be compiled into machine code that can be executed by the local cup. This part of the function is completed by a specific software system. This software system is called a virtual machine. Only one virtual machine needs to be provided for each operating system and cpu architecture. You can make an application run on different operating systems and computers with different cpu architectures without modification. Code running on such virtual machines becomes managed code

https://blog.csdn.net/m0_37552052/article/details/88093427
https://blog.csdn.net/tymatlab/article/details/80511709
https://blog.csdn.net/ZH013/article/details/105116715
https://pypi.org/project/pydecipher/

Fang 1
 adopt pydecipher take exe Turn into pyc
 adopt uncompyle6 take pyc Turn into py

Fang 2
 adopt pyinstxtractor take exe Turn into pyc
 adopt uncompyle6 take pyc Turn into py

pip install pydecipher report errors error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2	
pydecipher There is a saying: you can mac And linux of python3.8 Run on, but windows I don't know. The theory should work.

Example (Party 2)
Write a test Py file

Download pyinstxtracker py
https://codechina.csdn.net/mirrors/extremecoders-re/pyinstxtractor?utm_source=csdn_github_accelerator

Package py to exe pyinstaller test Py there will be test in the test folder under the generated dist file exe

Exe to pyc, take out the generated exe and pyinstxtracker Py is placed under the same level file Python pyinstxtracker py test. exe
In the generated test exe_ There will be test in the extracted folder pyc

Convert PyC to py and jump to test exe_ Uncompyle6 - O test. In the extracted folder py test. pyc

How to avoid Decompilation

http://dt.digitser.cn/zh-CN/applet/cython_file/index.html
https://www.jianshu.com/p/4a0be62ee3e2
https://blog.csdn.net/qq_39498924/article/details/101292339
https://www.zhihu.com/question/347425323/answer/834691490
http://yshblog.com/blog/117

Obfuscation code and encryption code (install Python and convert py file to pyd file)
But it seems that pyd is also a dll, which can also be decompiled.

Is the pyd file a dll file?

https://docs.python.org/3/faq/windows.html#id6

Yes, .pyd files are dll's, but there are a few differences. If you have a DLL named foo.pyd, then it must have a function PyInit_foo(). You can then write Python "import foo", and Python will search for foo.pyd (as well as foo.py, foo.pyc) and if it finds it, will attempt to call PyInit_foo() to initialize it. You do not link your .exe with foo.lib, as that would cause Windows to require the DLL to be present.
Note that the search path for foo.pyd is PYTHONPATH, not the same as the path that Windows uses to search for foo.dll. Also, foo.pyd need not be present to run your program, whereas if you linked your program with a dll, the dll is required. Of course, foo.pyd is required if you want to say import foo. In a DLL, linkage is declared in the source code with __declspec(dllexport). In a .pyd, linkage is defined in a list of available functions.

Yes, but there are still some differences.
foo.pyd Must have function PyInit_foo(). import foo take foo.pyd When importing, python Will look foo.pyd(Will also look for foo.pyc or foo.py,Suppose only foo.pyd File), which will be called when it is found and imported PyInit_foo()Initialize it.
Don't put exe And foo.lib Link, otherwise windows Will ask foo.dll Presence.
foo.pyd Search path and foo.dll It's different.
dll Link in source code__declspec(dllexport),pyd The link for is in the function.

cpython and cyton

https://zhuanlan.zhihu.com/p/65512422
https://www.tutorialspoint.com/what-is-the-difference-between-cython-and-cpython
https://cython.org/

cpython Yes c Realized python,jython yes java Realized python,IronPython yes.NET Realized python,These are interpreters.
cython Is a static compiler, yes cpython An extension of can be compiled cpython The code under the interpreter cannot be used to compile other interpreters, such as jython Code for.

LLVM

https://www.infoworld.com/article/3247799/what-is-llvm-the-power-behind-swift-rust-clang-and-more.html

Mozilla Rust
Apple Swift
Jetbrains Kotlin

LLVM is an open source project created by Swift language creator Chris Lattner. LLVM makes it easy to create a new language and upgrade existing languages.
Swift uses LLVM as its compiler framework; Rust uses LLVM as the core component of its tool chain; Both Clang compiler and C/C + + compiler have LLVM version; Mono is Net, you can choose to compile with LLVM; Kotlin is a JVM language and is currently developing a Kotlin Native compiled using LLVM.

Compilation principle

Compiling principles of Northeast University

Conversion between high-level language, assembly language and machine language:
High level language 1 → \rightarrow → advanced language 2
high-level language → \rightarrow → machine language
high-level language → \rightarrow → assembly language
assembly language → \rightarrow → advanced language
assembly language → \rightarrow → machine language
machine language → \rightarrow → assembly language

Compiler: translation of high-level language into equivalent low-level language
Source language - compiler - target language - running program - results. After compiling once, only the target language needs to be run each time, such as compiling c language into exe
Compiler: the result is obtained by running the high-level language directly
Source language -- interpreter -- result (python can be divided into line by line interpretation and overall interpretation)

Source language - lexical analysis - syntax analysis - semantic analysis - optimization processing - object code generation - object language. The five programs in the middle will constantly interact with error handlers and symbol table managers.
Source language - lexical analysis - word string TOKEN - syntax analysis - syntax tree - semantic analysis - semantic tree - optimization processing - optimization semantic tree - object code generation - target language. The target language can be assembly language or machine language.
Taking the semantic tree as the boundary, the whole process is divided into two parts: source language - front end - intermediate code - back end - target language

int a,b;
b = a+2*5;

Lexical analysis: identify and classify the words in the original program

keyword: int
 Identifier: a,b
 Constant: 2,5
 Delimiter:,  ;=  +  *

Grammar analysis: word formation, sentence formation and grammar error checking

Semantic analysis: analyze the semantic features of grammatical components

Symbol table
 Name type    type     address
  a    int    variable   pointer1
  b    int    variable   pointer2

Address: points to the location where the symbol is stored in the data area

Quaternion or semantic tree to describe semantic information

Optimization: improve the quality of the target program and reduce the storage space. Constant merging

Object code: