Python 3 standard library: nonpermanent references to weakref objects

Posted by mattgleeson on Tue, 25 Feb 2020 12:42:12 +0100

1. Non permanent reference of weakref object

The weakref module supports weak references to objects. Normal references increase the number of references to an object and prevent it from being garbage collected. But the result is not always as expected, for example, a circular reference may appear sometimes, or the cache of objects may be deleted when memory is needed. Weak reference is an object handle that can't avoid objects being automatically cleaned up.

1.1 Quotes

Weak references to objects are managed by ref classes. To get the original object, you can call the reference object.

import weakref

class ExpensiveObject:

    def __del__(self):
        print('(Deleting {})'.format(self))

obj = ExpensiveObject()
r = weakref.ref(obj)

print('obj:', obj)
print('ref:', r)
print('r():', r())

print('deleting obj')
del obj
print('r():', r())

Here, because obj has been deleted before the second call to reference, ref returns None.

1.2 reference callback

The ref constructor takes an optional callback function that is called when the referenced object is deleted.

import weakref

class ExpensiveObject:

    def __del__(self):
        print('(Deleting {})'.format(self))

def callback(reference):
    """Invoked when referenced object is deleted"""
    print('callback({!r})'.format(reference))

obj = ExpensiveObject()
r = weakref.ref(obj, callback)

print('obj:', obj)
print('ref:', r)
print('r():', r())

print('deleting obj')
del obj
print('r():', r())

When the reference has "died" and no longer references the original object, the callback accepts the reference object as a parameter. One use of this feature is to remove weak reference objects from the cache.

1.3 final object

To achieve more robust management of resources when cleaning up weak references, you can use finalize to associate callbacks with objects. The finalize instance remains (until the associated object is deleted), even if the application does not retain a reference to the finalized object.

import weakref

class ExpensiveObject:

    def __del__(self):
        print('(Deleting {})'.format(self))

def on_finalize(*args):
    print('on_finalize({!r})'.format(args))

obj = ExpensiveObject()
weakref.finalize(obj, on_finalize, 'extra argument')

del obj

The parameters of finalize include the object to be tracked, the callable to be called when the object is garbage collected, and all the location or named parameters passed into the callable.

The finalize instance has a writeable attribute atexit, which controls whether the callback is called when the program exits (if not already called).  

import sys
import weakref

class ExpensiveObject:

    def __del__(self):
        print('(Deleting {})'.format(self))

def on_finalize(*args):
    print('on_finalize({!r})'.format(args))

obj = ExpensiveObject()
f = weakref.finalize(obj, on_finalize, 'extra argument')
f.atexit = bool(int(sys.argv[1]))

The default setting is to call this callback. Setting atexit to false disables this behavior.

If you provide a reference to the tracked object to the finalize instance, it will cause a reference to be retained, so the object will never be garbage collected.  

import gc
import weakref

class ExpensiveObject:

    def __del__(self):
        print('(Deleting {})'.format(self))

def on_finalize(*args):
    print('on_finalize({!r})'.format(args))

obj = ExpensiveObject()
obj_id = id(obj)

f = weakref.finalize(obj, on_finalize, obj)
f.atexit = False

del obj

for o in gc.get_objects():
    if id(o) == obj_id:
        print('found uncollected object in gc')

As shown above, although the explicit reference to obj has been removed, the object remains visible to the garbage collector through f.

Using a binding method of the tracked object as a callable can also appropriately avoid object finalization.

import gc
import weakref

class ExpensiveObject:

    def __del__(self):
        print('(Deleting {})'.format(self))

    def do_finalize(self):
        print('do_finalize')

obj = ExpensiveObject()
obj_id = id(obj)

f = weakref.finalize(obj, obj.do_finalize)
f.atexit = False

del obj

for o in gc.get_objects():
    if id(o) == obj_id:
        print('found uncollected object in gc')

Because the callable provided for finalize is a binding method of the instance obj, the final method keeps a reference of obj, which cannot be deleted and garbage collected.

1.4 agency

Sometimes it is more convenient to use a proxy than a weak reference. Using a proxy can be like using the original object, and does not require that the proxy be called before the object is accessed. This means that the agent can be passed to a library that does not know it receives a reference rather than a real object.

import weakref

class ExpensiveObject:

    def __init__(self, name):
        self.name = name

    def __del__(self):
        print('(Deleting {})'.format(self))

obj = ExpensiveObject('My Object')
r = weakref.ref(obj)
p = weakref.proxy(obj)

print('via obj:', obj.name)
print('via ref:', r().name)
print('via proxy:', p.name)
del obj
print('via proxy:', p.name)

If the reference object is deleted before accessing the agent, a ReferenceError exception will be generated.

1.5 cache objects

ref and proxy classes are considered "low-level.". Although they are useful for maintaining weak references to a single object and also support garbage collection for circular references, the WeakKeyDictionary and WeakValueDictionary classes provide a more suitable API for creating caches of multiple objects.

The WeakValueDictionary class uses weak references to the values it contains and allows garbage collection when other code no longer actually uses those values. Using the garbage collector's explicit call, the following shows the difference between using a regular dictionary and a WeakValueDictionary to complete memory processing.  

import gc
from pprint import pprint
import weakref

gc.set_debug(gc.DEBUG_UNCOLLECTABLE)

class ExpensiveObject:

    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return 'ExpensiveObject({})'.format(self.name)

    def __del__(self):
        print('    (Deleting {})'.format(self))

def demo(cache_factory):
    # hold objects so any weak references
    # are not removed immediately
    all_refs = {}
    # create the cache using the factory
    print('CACHE TYPE:', cache_factory)
    cache = cache_factory()
    for name in ['one', 'two', 'three']:
        o = ExpensiveObject(name)
        cache[name] = o
        all_refs[name] = o
        del o  # decref

    print('  all_refs =', end=' ')
    pprint(all_refs)
    print('\n  Before, cache contains:', list(cache.keys()))
    for name, value in cache.items():
        print('    {} = {}'.format(name, value))
        del value  # decref

    # remove all references to the objects except the cache
    print('\n  Cleanup:')
    del all_refs
    gc.collect()

    print('\n  After, cache contains:', list(cache.keys()))
    for name, value in cache.items():
        print('    {} = {}'.format(name, value))
    print('  demo returning')
    return

demo(dict)
print()

demo(weakref.WeakValueDictionary)

If loop variables indicate cached values, they must be explicitly cleared to reduce the number of references to the object. Otherwise, the garbage collector does not delete these objects and they remain in the cache. Similarly, the all_refs variable is used to hold references to prevent them from being garbage collected prematurely.

WeakKeyDictionary works similarly, but uses a weak reference to a key in the dictionary instead of a weak reference to a value.

Topics: Python Attribute