Theory: Set

Posted by asy1mpo on Thu, 03 Feb 2022 20:56:08 +0100

You can use collection objects when you need to delete duplicates in a sequence or when you intend to perform some mathematical operations. A collection is an unordered container of hashable objects. You'll learn more about hashable objects later, and now remember that only immutable data types can be elements of a collection. Because of their form, the collection does not record the position or insertion order of elements, so you cannot retrieve elements by index.

Create set

First, we create a collection by listing its elements in curly braces. The only exception is the empty set() that can be formed with the help of the function:

empty_set = set()
print(type(empty_set))   # <class 'set'>

empty_dict = {}
print(type(empty_dict))  # <class 'dict'>

If you pass a string or list to set(), the function returns a collection containing all the elements of the string / list:

flowers = {'rose', 'lilac', 'daisy'}

# the order is not preserved
print(flowers)  # {'daisy', 'lilac', 'rose'}  


letters = set('philharmonic')
print(letters)  # {'h', 'r', 'i', 'c', 'o', 'l', 'a', 'p', 'm', 'n'}

Each element is considered only part of a set, so double letters are counted as one element:

letters = set('Hello')
print(len(letters))  # the length equals 4
print(letters)       # {'H', 'e', 'o', 'l'}

In addition, using collections can help you avoid duplication:

states = ['Russia', 'USA', 'USA', 'Germany', 'Italy']
print(set(states))  # {'Russia', 'USA', 'Italy', 'Germany'}

Take a look: since the order of named elements has no effect, the following two groups will be equal.

set1 = {'A', 'B', 'C'}
set2 = {'B', 'C', 'A'}
print(set1 == set2)  # True

Use the elements of the collection

  • Get the number of collection elements with the help of len() function.
  • Use the for loop to iterate through all elements.
  • Check whether the element belongs to a specific set (in / not in operator), and you will get a Boolean value.
nums = {1, 2, 2, 3}
print(1 in nums, 4 not in nums)  # True True
  • add() uses the method or update() uses another collection to add a new element to the collection
nums = {1, 2, 2, 3}
nums.add(5)
print(nums)  # {1, 2, 3, 5}

another_nums = {6, 7}
nums.update(another_nums)
print(nums)  # {1, 2, 3, 5, 6, 7}
 
# we can also add a list
text = ['how', 'are', 'you']
nums.update(text)
print(nums)  # {'you', 1, 2, 3, 5, 6, 7, 'are', 'how'}
 
# or a string
word = 'hello'
nums.add(word)
print(nums)  # {1, 2, 3, 'how', 5, 6, 7, 'hello', 'you', 'are'}

Note that when we update a collection with a list, these are the list elements added to the collection, not the list itself.

  • The discard/remove method is used to remove elements from a specific collection. The only difference between their operations is that there is no element to delete in the collection. In this case, discard does not perform any operation and remove generates a KeyError exception.
nums.remove(2)
print(nums)  # {1, 3, 5}

empty_set = set()
empty_set.discard(2)  # nothing happened
empty_set.remove(2)   # KeyError: 2
  • Use the method to delete a random element pop(). Since it will be random, you do not need to select parameters.
nums = {1, 2, 2, 3}
nums.pop()
print(nums)  # {2, 3}
  • clear() uses the method to delete all elements from the collection.

When to use collections?

An important feature of collections (and all general unordered collections) is that they allow you to run membership tests much faster than lists. In real life, if you have a list and you try to manually check whether there is a specific item in it, the only way is to look at the whole list until you find the item. Python does the same thing: it starts at the beginning of the list and looks for the desired item because it doesn't know where it can be placed. If the item is at the end or there is no such item at all, python will traverse most of the items in the list in real time when it finds it. Therefore, if your program looks for items in a large list many times, it will be slow.

This is where we gather to help us! Collection membership tests work almost immediately because they use different ways to store and arrange values. Therefore, depending on the situation, you need to decide what is more important to you: keep the order of items in the collection or test membership in a faster way. In the first case, it makes sense to store your items in a list, and in the second case, it's best to use set.

Frozenset

The only difference between set and frozenset is that set is a variable data type, but freezeset is not. To create a frozenset, we use the frozenset() function.

empty_frozenset = frozenset()
print(empty_frozenset)  # frozenset()

We can also create freeesets from lists, strings, or collections:

frozenset_from_set = frozenset({1, 2, 3})
print(frozenset_from_set)  # frozenset({1, 2, 3})

frozenset_from_list = frozenset(['how', 'are', 'you'])
print(frozenset_from_list)  # frozenset({'you', 'are', 'how'})

As mentioned above, frozenset is immutable. This means that although the elements of the collection can be changed, they remain unchanged after creation in freezeset. You cannot add or delete items.

empty_frozenset.add('some_text')  # AttributeError: 'frozenset' object has no attribute 'add'

So why do we need frozenset? Since a set is mutable, we cannot make it an element of another set.

text = {'hello', 'world'}
nested_text = {'!'}
nested_text.add(text)  # TypeError: unhashable type: 'set'

However, with frozenset, this problem will not occur. Because of its hashing and invariance, it can be an element of another set or an element of another frozenset set set.

some_frozenset = frozenset(text)
nested_text.add(some_frozenset)
print(nested_text)  # {'!', frozenset({'world', 'hello'})}

In addition, these properties of frozensets allow them to be keys in the Python dictionary, but you'll learn more later.

generalization

Considering all the factors, you now know how to use Collections:

  • You know how to create a new collection and what you can store in it (immutable data types only).
  • You understand the difference between set and other Python objects.
  • You can use the elements of the collection: add new elements or delete them, distinguish between the discard and remove methods, and so on.
  • You know when to use collections (which can really save you time!).
  • You know this frozenset is an immutable substitute for set.

Topics: set