python dictionaries and collections

Posted by youknowho on Fri, 28 Jan 2022 20:39:37 +0100

Dictionaries

A dictionary is a collection of elements consisting of key and value pairs
In Python 3 7 + dictionary is determined to be ordered, its length is variable, and elements can be added, deleted and modified arbitrarily.
Compared with lists and tuples, the dictionary has better performance, and the addition, deletion, modification and query operations can be completed in constant time complexity.

#Common ways of writing Dictionaries
d1 = {'name':'magic', 'age': 20}
d2 = dict({'name':'magic', 'age': 20})
d3 = dict([('name','magic'), ('age', 20)])
d4 = dict(name='magic', age=20)
print(d2)
print(d1 == d2 == d3 ==d4) #True
print(d1['name']) #magic
#If the print(d1['name2 ']) key does not exist, an exception will be thrown
#d1. When get ('name2 ', none) does not exist, it returns a given default value to solve the problem of throwing exceptions when the key does not exist
sorted(d1.items(),key=lambda x: x[0]) #Sort in ascending order according to dictionary keys
sorted(d1.items(),key=lambda x: x[1]) #Sort in ascending order based on dictionary values
'''
lambda Anonymous function
lambda x: x[0] 
amount to
def f(x):
    return x[0]
'''

aggregate

A set is basically the same as a dictionary, except that there is no pairing of keys and values. It is a series of unordered and unique element combinations.
A collection does not support index access and operation. It is essentially a hash table, which is different from a list.

#Common ways to write sets
s1 = {1, 2, 3, 4, 'magic', 'two'}
s2 = set([1, 2, 3, 4, 'magic', 'two'])
print(s1 == s2) #True
#s1[0] accessing through index subscript will throw an exception. The collection does not support index
sorted(s1) #Sort the elements of the collection in ascending order

'''adopt value in dict/set To determine whether an element is in a dictionary or collection'''
print(1 in s1) #True
print('name' in d1) #True
print('magic' in d1) #False dictionary is judged by the key here

How dictionaries and collections work

The internal structure of dictionary and collection is a hash table
The dictionary table stores three elements: hash, key and value. The set has no key and value pairing, but only a single element

The structure of the old version of python hash table is as follows:

Hash value (hash)keyvalue
hash0key0value0
hash1key1value1
hash2key2value2

With the data storage, it will become more sparse, and its storage will be similar to the following form:

data = [
['--'],['--'],['--']
['hash0'],['key0'],['value0']
['--'],['--'],['--']
['--'],['--'],['--']
['hash1'],['key1'],['value1']
['--'],['--'],['--']
['hash2'],['key2'],['value2']
]

Such a design structure will waste storage space, in order to improve the utilization of storage space
The current hash table will separate the index from the hash value, key and value to form the following new structure:
Indexes table

NoneindexNoneindexNoneindexNoneindexNoneindex

data sheet

Hash value (hash)keyvalue
hash0key0value0
hash1key1value1
hash2key2value2

Storing data in the new structure will form the following forms:

indices = [None, 1 ,None , None , 0 ,None , 2]
data = [
123123213, 'name', 'magic'
'hash--11', 'name', 'long'
12312323, 'age' , 20
]

1. Insert operation

Each time you want to insert an element into a dictionary or collection, the hash (key) of the key will be calculated first,
Then do the and operation with mask = PyDicMinSize - 1 to calculate the position where this element should be inserted into the hash table. Index = hash (key) & mask
If the location is empty, the element will be inserted into it
If the position is occupied, the hash value and key of the two elements are compared for equality:

  1. If the two are equal, it indicates that the element already exists. If the values are different, the value is updated
  2. If the two are not equal, this is a hash collision. In this case, python will continue to look for empty inserts in the table

There are two main methods to solve hash conflict, one is open addressing method and the other is linked list method
The occurrence of hash conflict often reduces the speed of operation. Therefore, with the continuous insertion of elements, when the remaining space is less than 1 / 3
python will regain more memory space, expand the hash table, and all element positions in the table will be rearranged
Hash collision and resizing will reduce efficiency, but the number of such cases is very small. On average, the time complexity of inserting, finding and deleting is O(1)

2. Search operation
Similar to the insertion operation, calculate the hash value hash (key) of the key, find its location, and compare the hash value and key of the element in the location of the hash table,
Whether it is equal to the element to be searched. If it is equal, it returns. If it is not equal, continue to search until an empty bit is found or an exception is thrown.

2. Delete
For the delete operation, python will temporarily assign a special value to the element at this position, and then delete it when the hash table is resized.
Lazy processing method is adopted to reduce the complexity of each deletion time and share it equally.

summary

The dictionary is in Python 3 7 + is an ordered data structure, and the set is disordered. Its internal hash table storage structure ensures the efficiency of its search, insert and delete operations.
Therefore, dictionaries and collections are usually used in scenarios such as efficient search and de duplication of elements.

Topics: Python