Iteratable object, iterator and generator of Python

Posted by thetechgeek on Thu, 16 Dec 2021 02:57:15 +0100

1, Concept description

An iteratable object is an object that can be iterated. We can obtain its iterator through the built-in ITER function. The iteratable object needs to be implemented internally__ iter__ Function to return its associated iterator;

Iterator is responsible for traversing specific data one by one, which is implemented by__ next__ The function can access the associated data elements one by one; At the same time, through__ iter__ To realize the compatibility of iterative objects;

Generator is an iterator mode, which realizes the lazy generation of data, that is, the corresponding elements will be generated only when used;

2, Iteratability of sequences

python's built-in sequence can be iterated through for, and the interpreter will call the iter function to obtain the iterator of the sequence, because the iter function is compatible with the implementation of the sequence__ getitem__, An iterator is automatically created;

Iterator

import re
from dis import dis

class WordAnalyzer:
    reg_word = re.compile('\w+')

    def __init__(self, text):
        self.words = self.__class__.reg_word.findall(text)

    def __getitem__(self, index):
        return self.words[index]


def iter_word_analyzer():
    wa = WordAnalyzer('this is mango word analyzer')
    print('start for wa')
    for w in wa:
        print(w)

    print('start while wa_iter')
    wa_iter = iter(wa)
    while True:
        try:
            print(next(wa_iter))
        except StopIteration as e:
            break;

iter_word_analyzer()
dis(iter_word_analyzer)

# start for wa
# this
# is
# mango
# word
# analyzer
# start while wa_iter
# this
# is
# mango
# word
# analyzer
#  15           0 LOAD_GLOBAL              0 (WordAnalyzer)
#               2 LOAD_CONST               1 ('this is mango word analyzer')
#               4 CALL_FUNCTION            1
#               6 STORE_FAST               0 (wa)
# 
#  16           8 LOAD_GLOBAL              1 (print)
#              10 LOAD_CONST               2 ('start for wa')
#              12 CALL_FUNCTION            1
#              14 POP_TOP
# 
#  17          16 LOAD_FAST                0 (wa)
#              18 GET_ITER
#         >>   20 FOR_ITER                12 (to 34)
#              22 STORE_FAST               1 (w)
# 
#  18          24 LOAD_GLOBAL              1 (print)
#              26 LOAD_FAST                1 (w)
#              28 CALL_FUNCTION            1
#              30 POP_TOP
#              32 JUMP_ABSOLUTE           20
# 
#  20     >>   34 LOAD_GLOBAL              1 (print)
#              36 LOAD_CONST               3 ('start while wa_iter')
#              38 CALL_FUNCTION            1
#              40 POP_TOP
# 
#  21          42 LOAD_GLOBAL              2 (iter)
#              44 LOAD_FAST                0 (wa)
#              46 CALL_FUNCTION            1
#              48 STORE_FAST               2 (wa_iter)
# 
#  23     >>   50 SETUP_FINALLY           16 (to 68)
# 
#  24          52 LOAD_GLOBAL              1 (print)
#              54 LOAD_GLOBAL              3 (next)
#              56 LOAD_FAST                2 (wa_iter)
#              58 CALL_FUNCTION            1
#              60 CALL_FUNCTION            1
#              62 POP_TOP
#              64 POP_BLOCK
#              66 JUMP_ABSOLUTE           50
# 
#  25     >>   68 DUP_TOP
#              70 LOAD_GLOBAL              4 (StopIteration)
#              72 JUMP_IF_NOT_EXC_MATCH   114
#              74 POP_TOP
#              76 STORE_FAST               3 (e)
#              78 POP_TOP
#              80 SETUP_FINALLY           24 (to 106)
# 
#  26          82 POP_BLOCK
#              84 POP_EXCEPT
#              86 LOAD_CONST               0 (None)
#              88 STORE_FAST               3 (e)
#              90 DELETE_FAST              3 (e)
#              92 JUMP_ABSOLUTE          118
#              94 POP_BLOCK
#              96 POP_EXCEPT
#              98 LOAD_CONST               0 (None)
#             100 STORE_FAST               3 (e)
#             102 DELETE_FAST              3 (e)
#             104 JUMP_ABSOLUTE           50
#         >>  106 LOAD_CONST               0 (None)
#             108 STORE_FAST               3 (e)
#             110 DELETE_FAST              3 (e)
#             112 RERAISE
#         >>  114 RERAISE
#             116 JUMP_ABSOLUTE           50
#         >>  118 LOAD_CONST               0 (None)
#             120 RETURN_VALUE

3, Classic iterator pattern

A standard iterator needs to implement two interface methods, one of which can get the name of the next element__ next__ Methods and methods that return self directly__ iter__ method;

When the iterator iterates over all elements, it will throw a StopIteration exception, but python's built-in for, list push, tuple unpacking, etc. will automatically handle this exception;

Realize__ iter__ The main purpose is to facilitate the use of iterators, so as to maximize the convenience of using iterators;

The iterator can only iterate once. If it needs to iterate again, it needs to call the iter method again to obtain a new iterator, which requires each iterator to maintain its own internal state, that is, an object cannot be both an iteratable object and an iterator;

From the classic object-oriented design pattern, iteratable objects can generate their own associated iterators at any time, and iterators are responsible for the iterative processing of specific elements;

import re
from dis import dis

class WordAnalyzer:
    reg_word = re.compile('\w+')

    def __init__(self, text):
        self.words = self.__class__.reg_word.findall(text)

    def __iter__(self):
        return WordAnalyzerIterator(self.words)

class WordAnalyzerIterator:

    def __init__(self, words):
        self.words = words
        self.index = 0

    def __iter__(self):
        return self;

    def __next__(self):
        try:
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index +=1
        return word

def iter_word_analyzer():
    wa = WordAnalyzer('this is mango word analyzer')
    print('start for wa')
    for w in wa:
        print(w)

    print('start while wa_iter')
    wa_iter = iter(wa)
    while True:
        try:
            print(next(wa_iter))
        except StopIteration as e:
            break;

iter_word_analyzer()

# start for wa
# this
# is
# mango
# word
# analyzer
# start while wa_iter
# this
# is
# mango
# word
# analyzer

4, Generators are also iterators

The generator is generated by calling the generator function, which is a factory function containing yield;

The generator itself is an iterator, which supports traversing the generator using the next function, and will throw a StopIteration exception after traversal;

When the generator executes, it will pause at the place of the yield statement and return the value of the expression on the right of yield;

def gen_func():
    print('first yield')
    yield 'first'
    print('second yield')
    yield 'second'

print(gen_func)
g = gen_func()
print(g)

for val in g:
    print(val)

g = gen_func()
print(next(g))
print(next(g))
print(next(g))

# <function gen_func at 0x7f1198175040>
# <generator object gen_func at 0x7f1197fb6cf0>
# first yield
# first
# second yield
# second
# first yield
# first
# second yield
# second
# StopIteration

We can__ iter__ As a generator function

import re
from dis import dis

class WordAnalyzer:
    reg_word = re.compile('\w+')

    def __init__(self, text):
        self.words = self.__class__.reg_word.findall(text)

    def __iter__(self):
        for word in self.words:
            yield word



def iter_word_analyzer():
    wa = WordAnalyzer('this is mango word analyzer')
    print('start for wa')
    for w in wa:
        print(w)

    print('start while wa_iter')
    wa_iter = iter(wa)
    while True:
        try:
            print(next(wa_iter))
        except StopIteration as e:
            break;

iter_word_analyzer()

# start for wa
# this
# is
# mango
# word
# analyzer
# start while wa_iter
# this
# is
# mango
# word
# analyzer

5, Implement lazy iterators

One of the highlights of iterators is through__ next__ To realize the traversal of elements one by one, which brings the possibility of traversal of this big data container;

Our previous implementation directly calls re. When initializing Findall gets all the sequence elements, which is not a good implementation; We can use re Finder to get data during traversal;

import re
from dis import dis

class WordAnalyzer:
    reg_word = re.compile('\w+')

    def __init__(self, text):
        # self.words = self.__class__.reg_word.findall(text)
        self.text = text

    def __iter__(self):
        g = self.__class__.reg_word.finditer(self.text)
        print(g)
        for match in g:
            yield match.group()



def iter_word_analyzer():
    wa = WordAnalyzer('this is mango word analyzer')
    print('start for wa')
    for w in wa:
        print(w)

    print('start while wa_iter')
    wa_iter = iter(wa)
    wa_iter1= iter(wa)
    while True:
        try:
            print(next(wa_iter))
        except StopIteration as e:
            break;

iter_word_analyzer()

# start for wa
# <callable_iterator object at 0x7feed103e040>
# this
# is
# mango
# word
# analyzer
# start while wa_iter
# <callable_iterator object at 0x7feed103e040>
# this
# is
# mango
# word
# analyzer

6, Simplifying lazy iterators with generator expressions

A generator expression is a declarative definition of a generator, which is similar to the syntax of a list push, except that the generated elements are inert;

def gen_func():
    print('first yield')
    yield 'first'
    print('second yield')
    yield 'second'

l = [x for x in gen_func()]
for x in l:
    print(x)

print()

ge = (x for x in gen_func())
print(ge)
for x in ge:
    print(x)

# first yield
# second yield
# first
# second
#
# <generator object <genexpr> at 0x7f78ff5dfd60>
# first yield
# first
# second yield
# second

Implementing word analyzer using generator expressions

import re
from dis import dis

class WordAnalyzer:
    reg_word = re.compile('\w+')

    def __init__(self, text):
        # self.words = self.__class__.reg_word.findall(text)
        self.text = text

    def __iter__(self):
        # g = self.__class__.reg_word.finditer(self.text)
        # print(g)
        # for match in g:
        #     yield match.group()
        ge = (match.group() for match in self.__class__.reg_word.finditer(self.text))
        print(ge)
        return ge



def iter_word_analyzer():
    wa = WordAnalyzer('this is mango word analyzer')
    print('start for wa')
    for w in wa:
        print(w)

    print('start while wa_iter')
    wa_iter = iter(wa)
    while True:
        try:
            print(next(wa_iter))
        except StopIteration as e:
            break;

iter_word_analyzer()

# start for wa
# <generator object WordAnalyzer.__iter__.<locals>.<genexpr> at 0x7f4178189200>
# this
# is
# mango
# word
# analyzer
# start while wa_iter
# <generator object WordAnalyzer.__iter__.<locals>.<genexpr> at 0x7f4178189200>
# this
# is
# mango
# word
# analyzer

Topics: Python Back-end