Python_ Garbage collection mechanism

Posted by elementz on Mon, 20 Dec 2021 00:09:51 +0100

I Why do I need a garbage collection mechanism

  • When a Python program is running, it needs to open up a space in memory to store the temporary variables generated during running; After the calculation is completed, the result is output to the permanent memory. If the amount of data is too large and the memory space is mismanaged, it is easy to appear OOM (out of memory), commonly known as memory explosion, and the program may be aborted by the operating system.
  • For the server, which is designed as an uninterrupted system, memory management is more important, otherwise it is easy to cause memory leakage.
  • The leak here does not mean that your memory has an information security problem and is used by malicious programs, but that the program itself is not well designed, resulting in the failure of the program to release the memory that is no longer used.
  • Memory leakage does not mean that your memory disappears physically, but means that the code loses control of a certain section of memory due to design errors, resulting in a waste of memory.

II Technical references to Python

describe

When the reference count (pointer count) of the object is 0, it means that the object will never be reachable. Naturally, it will become garbage and need to be recycled.

example

import os
import psutil

# Displays the amount of memory occupied by the current python program
def show_memory_info(hint):
    pid = os.getpid()
    p = psutil.Process(pid)

    info = p.memory_full_info()
    memory = info.uss / 1024. /1024
    print('{} memory used: {} MB'.format(hint, memory))

def func():
    show_memory_info('initial')
    a = [i for i in range(10000000)]
    show_memory_info('after a created')

func()
show_memory_info('finished')

'''
initial memory used: 23.54296875 MB
after a created memory used: 410.73828125 MB
finished memory used: 23.8671875 MB
'''
'''
In list a After it was created, the memory consumption increased rapidly to 433 MB: After the function call, the memory returns to normal.
This is because the list declared inside the function a Is a local variable. After the function returns, the reference to the local variable will be unregistered;
The list appears a The number of references to the referenced object is 0, Python Garbage collection is performed, so the large amount of memory previously occupied is back.
'''

def func():
    show_memory_info('initial')
    global a
    a = [i for i in range(10000000)]
    show_memory_info('after a created')
func()
show_memory_info('finished')
'''
initial memory used: 24.3046875 MB
after a created memory used: 411.32421875 MB
finished memory used: 411.35546875 MB
'''
'''
global a Indicates that it will a Declared as a global variable.
Then, even if the function returns, the list reference still exists, so the object will not be garbage collected and still occupy a lot of memory.
Similarly, if the generated list is returned and received in the main program, the reference still exists,
Garbage collection will not be triggered, and a large amount of memory is still occupied
'''
  • Reference counting mechanism inside Python
import sys

a = []

# Two references, one from a and one from getrefcount
print(sys.getrefcount(a))

def func(a):
    # Four references, a, python's function call stack, function parameters, and getrefcount
    print(sys.getrefcount(a))
func(a)

# Two references, one from a and one from getrefcount, function func call no longer exists
print(sys.getrefcount(a))
  • Manually free memory
import os
import psutil
import gc

# Displays the amount of memory occupied by the current python program
def show_memory_info(hint):
    pid = os.getpid()
    p = psutil.Process(pid)

    info = p.memory_full_info()
    memory = info.uss / 1024. /1024
    print('{} memory used: {} MB'.format(hint, memory))

show_memory_info('initial')

a = [i for i in range(10000000)]

show_memory_info('after a created')

del a
gc.collect()

show_memory_info('finish')
print(a)
'''
initial memory used: 23.55859375 MB
after a created memory used: 411.3203125 MB
finish memory used: 24.24609375 MB
Traceback (most recent call last):
  File "C:/Users/14116/Desktop/Exercise items/Wechat reading/ppp/utils/mat_mul.py", line 24, in <module>
    print(a)
NameError: name 'a' is not defined
'''
  • Circular reference
import os
import psutil

# Displays the amount of memory occupied by the current python program
def show_memory_info(hint):
    pid = os.getpid()
    p = psutil.Process(pid)

    info = p.memory_full_info()
    memory = info.uss / 1024. /1024
    print('{} memory used: {} MB'.format(hint, memory))

def func():
    show_memory_info('initial')
    a = [i for i in range(10000000)]
    b = [i for i in range(10000000)]
    show_memory_info('after a, b, created')
    a.append(b)
    b.append(a)
func()
show_memory_info('finished')

'''
initial memory used: 23.4609375 MB
after a, b, created memory used: 798.6171875 MB
finished memory used: 798.7109375 MB
'''
'''
here, a and b Reference each other, and, as a local variable, in the function func After the call,
a and b These two pointers no longer exist in the procedural sense. However, it is obvious that there is still memory occupation!
Why? Because of mutual reference, the number of references is not 0.
'''
  • Explicitly call GC Collect() to start garbage collection
import os
import psutil
import gc

# Displays the amount of memory occupied by the current python program
def show_memory_info(hint):
    pid = os.getpid()
    p = psutil.Process(pid)

    info = p.memory_full_info()
    memory = info.uss / 1024. /1024
    print('{} memory used: {} MB'.format(hint, memory))

def func():
    show_memory_info('initial')
    a = [i for i in range(10000000)]
    b = [i for i in range(10000000)]
    show_memory_info('after a, b created')
    a.append(b)
    b.append(a)
func()
gc.collect()
show_memory_info('finished')
'''
initial memory used: 23.52734375 MB
after a, b created memory used: 798.78125 MB
finished memory used: 24.359375 MB
'''

III Debug memory leak

describe

objgraph, a very easy-to-use package for visualizing reference relationships. In this package, there are two main functions. The first is show_refs(), which can generate a clear reference diagram. Another very useful function is show_backrefs(), which can generate a slightly more complex reference call graph

example

import objgraph

a = [1, 2, 3]
b = [4, 5, 6]

a.append(b)
b.append(a)

objgraph.show_refs([a])

import objgraph

a = [1, 2, 3]
b = [4, 5, 6]

a.append(b)
b.append(a)

objgraph.show_backrefs([a])

Topics: Python pointer list Memory Leak