[data structure] the order table in Python -- List

Posted by hothientuan on Fri, 18 Feb 2022 07:38:54 +0100

College compulsory courses

"Data structure and algorithm" is a required course of computer, no matter in which university. I remember that the course was implemented in C language at that time. The first data structure I came into contact with was the sequential table in the linear table, which was implemented by array. The structure code is as follows:

#define MAXSIZE 20
typedef int ElemType;
typedef struct
{
	ElemType data[MAXSIZE];
	int length ;//Current length of sequence table
}Sqlist;

A structure is encapsulated here. In fact, it encapsulates the data. The maximum length of the data is defined as MAXSIZE. In addition, a variable recording the current length of the data is added.

The two most commonly used operations of sequence table are insert and delete, and the code is available online. However, there is a problem with this static sequence table, that is, once the maximum length is reached, data insertion can not be carried out again, so there is also a dynamic sequence table structure;

typedef int DataType;
typedef struct SeqList
{
    DataType* _a;
    size_t _size; // Number of valid data 
    size_t _capacity; // capacity 
}SeqList;

Here we use the essence of C language - pointer. In short, you can dynamically expand the capacity. Specific implementation reference https://blog.csdn.net/qq_34772530/article/details/78804329

Life is short. I use Python

Taoists who have used Pyhton have a deep understanding of the sentence "life is short, I use Python".

Python also has a corresponding sequential table structure - List. The corresponding methods have been encapsulated. Although the bottom layer is still implemented in C, it is comfortable to use it directly.
But sometimes I wonder how to define the internal structure? How to dynamically insert extensions?
So, I went to GitHub to find the source code, combined with python source code analysis and other bloggers to share, and recorded the content I understood.

Structure definition

cpython-master\Include\cpython\listobject.h

typedef struct {
    PyObject_VAR_HEAD 
    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;
    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     * list.sort() temporarily sets allocated to -1 to detect mutations.
     *
     * Items must normally not be NULL, except during construction when
     * the list is not yet visible outside the function that builds it.
     */
    Py_ssize_t allocated;
} PyListObject;

In Cpython master \ include \ object Pyobject in H_ VAR_ Definition of head

/* PyObject_VAR_HEAD defines the initial segment of all variable-size
 * container objects.  These end with a declaration of an array with 1
 * element, but enough space is malloc'ed so that the array actually
 * has room for ob_size elements.  Note that ob_size is an element count,
 * not necessarily a byte count.
 */

PyObject_ VAR_ Ob in head_ Size records the number of elements in the current list;
PyObject **ob_item is a pointer to the first address of the memory block where the list is located;
allocated records the memory size applied by this list;
ob_ The relationship between size and allocated is as follows

  0 <= ob_size <= allocated
  len(list) == ob_size
  ob_item == NULL It means ob_size == allocated == 0
List creation

cpython-master\Include\cpython\listobject.h

PyObject *
PyList_New(Py_ssize_t size)
{
    //1. The parameter check list size cannot be 0
	if (size < 0) {
        PyErr_BadInternalCall();
        return NULL;
    }

    struct _Py_list_state *state = get_list_state();
    PyListObject *op;
#ifdef Py_DEBUG
    // PyList_New() must not be called after _PyList_Fini()
    assert(state->numfree != -1);
#endif
    //2. Apply for space for PyListObject object
    if (state->numfree) {
        state->numfree--;
        op = state->free_list[state->numfree];
        _Py_NewReference((PyObject *)op);
    }
    else {
        op = PyObject_GC_New(PyListObject, &PyList_Type);
        if (op == NULL) {
            return NULL;
        }
    }
	//3. Apply for space for the elements in the PyListObject object
    if (size <= 0) {
        op->ob_item = NULL;
    }
    else {
        op->ob_item = (PyObject **) PyMem_Calloc(size, sizeof(PyObject *));
        if (op->ob_item == NULL) {
            Py_DECREF(op);
            return PyErr_NoMemory();
        }
    }
	//4. Maintain allocated and ob_size
    Py_SET_SIZE(op, size);
    op->allocated = size;
    _PyObject_GC_TRACK(op);
    return (PyObject *) op;
}

List creation mainly includes the following processes:
1. The parameter check list size cannot be 0
2. Create a new PyListObject object. Here, the object level buffer pool technology is used (I don't understand),
Check free_ Whether there are available objects in the list. If so, use the available objects directly;
If all objects in the buffer pool cannot be used, use PyObject_GC_New applies for memory in the system heap and creates a new object.
3. After creating a new PyListObject object, immediately create the list elements maintained by the PyListObject object object according to the size parameter.
4. After creating the PyListObject object and list, maintain allocated and ob_size

Assignment operation

cpython-master\Include\cpython\listobject.h

int
PyList_SetItem(PyObject *op, Py_ssize_t i,
               PyObject *newitem)
{
    PyObject **p;
    if (!PyList_Check(op)) {
        Py_XDECREF(newitem);
        PyErr_BadInternalCall();
        return -1;
    }
    if (!valid_index(i, Py_SIZE(op))) {
        Py_XDECREF(newitem);
        PyErr_SetString(PyExc_IndexError,
                        "list assignment index out of range");
        return -1;
    }
    p = ((PyListObject *)op) -> ob_item + i;
    Py_XSETREF(*p, newitem);
    return 0;
}

The assignment operation mainly includes the following processes:
1. Parameter type check
2. Check the validity of the index. The index cannot be exceeded
3. Element assignment

Insert element

cpython-master\Include\cpython\listobject.h
Insert elements using the ins1(PyListObject *self, Py_ssize_t where, PyObject *v) method

static int
ins1(PyListObject *self, Py_ssize_t where, PyObject *v)
{
    Py_ssize_t i, n = Py_SIZE(self);
    PyObject **items;
	//1. Parameter inspection
    if (v == NULL) {
        PyErr_BadInternalCall();
        return -1;
    }
	
	//2. List capacity adjustment
    assert((size_t)n + 1 < PY_SSIZE_T_MAX);
    if (list_resize(self, n+1) < 0)
        return -1;
	
	//3. Determine the insertion point
    if (where < 0) {
        where += n;
        if (where < 0)
            where = 0;
    }
    if (where > n)
        where = n;
	//4. Insert element
    items = self->ob_item;
    for (i = n; --i >= where; )
        items[i+1] = items[i];
    Py_INCREF(v);
    items[where] = v;
    return 0;
}

Inserting elements mainly includes the following processes
1. Parameter inspection
2. Adjust the list capacity and use the list_ The resize function maintains the capacity. There is an algorithm to determine whether to apply for the memory expansion list. The specific implementation can study this function
3. Determine the insertion point. Since the List has a negative index, you must deal with negative numbers.
4. After determining the insertion position, start to move the element, and move the element after the insertion point back one bit, so that the insertion point will leave a position.

Delete element

cpython-master\Include\cpython\listobject.h
List is used to delete elements_ Remove (pylistobject * self, pyobject * value) method

static PyObject *
list_remove(PyListObject *self, PyObject *value)
/*[clinic end generated code: output=f087e1951a5e30d1 input=2dc2ba5bb2fb1f82]*/
{
    Py_ssize_t i;

    for (i = 0; i < Py_SIZE(self); i++) {
        PyObject *obj = self->ob_item[i];
        Py_INCREF(obj);
        int cmp = PyObject_RichCompareBool(obj, value, Py_EQ);
        Py_DECREF(obj);
        if (cmp > 0) {
            if (list_ass_slice(self, i, i+1,
                               (PyObject *)NULL) == 0)
                Py_RETURN_NONE;
            return NULL;
        }
        else if (cmp < 0)
            return NULL;
    }
    PyErr_SetString(PyExc_ValueError, "list.remove(x): x not in list");
    return NULL;
}

Deleting elements mainly involves the following processes
1. Traverse the whole list, compare the element to be deleted with each element in PyListObject, and call if it matches
list_ass_slice function.
2,list_ass_slice determines whether to replace or delete according to whether v is NULL;
for example
l = [1,2,3,4]
ilow = 1,ihigh=3,v=['a','b']
l = [1,'a','b','4']
ilow = 0,ihigh=2,v=[]
l = ['b','4']
list_ ass_ The implementation of slice is very complex, but I don't understand it. You can study it if you are interested, but I feel that C needs to be very familiar to understand it.

cpython-master\Include\cpython\listobject.h also has the specific implementation of other methods of Lits. If you are interested, you can study a wave. Although Pyhton is very powerful, I feel that C language is really powerful.

reference resources:
python source code analysis
https://blog.csdn.net/lucky404/article/details/79596319

Topics: Python data structure list