I Why do I need a garbage collection mechanism
- When a Python program is running, it needs to open up a space in memory to store the temporary variables generated during running; After the calculation is completed, the result is output to the permanent memory. If the amount of data is too large and the memory space is mismanaged, it is easy to appear OOM (out of memory), commonly known as memory explosion, and the program may be aborted by the operating system.
- For the server, which is designed as an uninterrupted system, memory management is more important, otherwise it is easy to cause memory leakage.
- The leak here does not mean that your memory has an information security problem and is used by malicious programs, but that the program itself is not well designed, resulting in the failure of the program to release the memory that is no longer used.
- Memory leakage does not mean that your memory disappears physically, but means that the code loses control of a certain section of memory due to design errors, resulting in a waste of memory.
II Technical references to Python
describe
When the reference count (pointer count) of the object is 0, it means that the object will never be reachable. Naturally, it will become garbage and need to be recycled.
example
import os import psutil # Displays the amount of memory occupied by the current python program def show_memory_info(hint): pid = os.getpid() p = psutil.Process(pid) info = p.memory_full_info() memory = info.uss / 1024. /1024 print('{} memory used: {} MB'.format(hint, memory)) def func(): show_memory_info('initial') a = [i for i in range(10000000)] show_memory_info('after a created') func() show_memory_info('finished') ''' initial memory used: 23.54296875 MB after a created memory used: 410.73828125 MB finished memory used: 23.8671875 MB ''' ''' In list a After it was created, the memory consumption increased rapidly to 433 MB: After the function call, the memory returns to normal. This is because the list declared inside the function a Is a local variable. After the function returns, the reference to the local variable will be unregistered; The list appears a The number of references to the referenced object is 0, Python Garbage collection is performed, so the large amount of memory previously occupied is back. ''' def func(): show_memory_info('initial') global a a = [i for i in range(10000000)] show_memory_info('after a created') func() show_memory_info('finished') ''' initial memory used: 24.3046875 MB after a created memory used: 411.32421875 MB finished memory used: 411.35546875 MB ''' ''' global a Indicates that it will a Declared as a global variable. Then, even if the function returns, the list reference still exists, so the object will not be garbage collected and still occupy a lot of memory. Similarly, if the generated list is returned and received in the main program, the reference still exists, Garbage collection will not be triggered, and a large amount of memory is still occupied '''
- Reference counting mechanism inside Python
import sys a = [] # Two references, one from a and one from getrefcount print(sys.getrefcount(a)) def func(a): # Four references, a, python's function call stack, function parameters, and getrefcount print(sys.getrefcount(a)) func(a) # Two references, one from a and one from getrefcount, function func call no longer exists print(sys.getrefcount(a))
- Manually free memory
import os import psutil import gc # Displays the amount of memory occupied by the current python program def show_memory_info(hint): pid = os.getpid() p = psutil.Process(pid) info = p.memory_full_info() memory = info.uss / 1024. /1024 print('{} memory used: {} MB'.format(hint, memory)) show_memory_info('initial') a = [i for i in range(10000000)] show_memory_info('after a created') del a gc.collect() show_memory_info('finish') print(a) ''' initial memory used: 23.55859375 MB after a created memory used: 411.3203125 MB finish memory used: 24.24609375 MB Traceback (most recent call last): File "C:/Users/14116/Desktop/Exercise items/Wechat reading/ppp/utils/mat_mul.py", line 24, in <module> print(a) NameError: name 'a' is not defined '''
- Circular reference
import os import psutil # Displays the amount of memory occupied by the current python program def show_memory_info(hint): pid = os.getpid() p = psutil.Process(pid) info = p.memory_full_info() memory = info.uss / 1024. /1024 print('{} memory used: {} MB'.format(hint, memory)) def func(): show_memory_info('initial') a = [i for i in range(10000000)] b = [i for i in range(10000000)] show_memory_info('after a, b, created') a.append(b) b.append(a) func() show_memory_info('finished') ''' initial memory used: 23.4609375 MB after a, b, created memory used: 798.6171875 MB finished memory used: 798.7109375 MB ''' ''' here, a and b Reference each other, and, as a local variable, in the function func After the call, a and b These two pointers no longer exist in the procedural sense. However, it is obvious that there is still memory occupation! Why? Because of mutual reference, the number of references is not 0. '''
- Explicitly call GC Collect() to start garbage collection
import os import psutil import gc # Displays the amount of memory occupied by the current python program def show_memory_info(hint): pid = os.getpid() p = psutil.Process(pid) info = p.memory_full_info() memory = info.uss / 1024. /1024 print('{} memory used: {} MB'.format(hint, memory)) def func(): show_memory_info('initial') a = [i for i in range(10000000)] b = [i for i in range(10000000)] show_memory_info('after a, b created') a.append(b) b.append(a) func() gc.collect() show_memory_info('finished') ''' initial memory used: 23.52734375 MB after a, b created memory used: 798.78125 MB finished memory used: 24.359375 MB '''
III Debug memory leak
describe
objgraph, a very easy-to-use package for visualizing reference relationships. In this package, there are two main functions. The first is show_refs(), which can generate a clear reference diagram. Another very useful function is show_backrefs(), which can generate a slightly more complex reference call graph
example
import objgraph a = [1, 2, 3] b = [4, 5, 6] a.append(b) b.append(a) objgraph.show_refs([a])
import objgraph a = [1, 2, 3] b = [4, 5, 6] a.append(b) b.append(a) objgraph.show_backrefs([a])