Python advanced notes

Posted by Teaky on Fri, 04 Mar 2022 11:32:25 +0100

brief introduction

  • The basic notes introduce the basic problems such as data type and file operation. If you want to know more, you can take a look at the interview
  • The advanced part mainly includes object-oriented idea, exception handling, iterator, generator and coroutine

object-oriented

  • Python has been an object-oriented language since its design
  • Class: used to describe a collection of objects with the same attributes and methods. It mainly includes the following concepts:
    • Methods: functions defined in classes
    • Class variable: class variable is common in the whole instantiated object and can be understood as static type
    • Local variable: a variable defined in a method
    • Instance variable: also known as class attribute, that is, variable decorated with self
    • Class and derived methods inherit from a class
    • Method override: if the method inherited from the parent class cannot meet the needs of the child class, it can be overridden, also known as method override
    • Objects: instances of classes
    • Construction method:__ init__, The method that will be called automatically when the class is instantiated
    class MyClass:
    	def __init__(self):
    		self.i = 12345
        def f(self):		# Note that self should be added to member functions
            return self.i
    
    # Instantiation class
    x = MyClass()
     
    # Access the properties and methods of the class
    print("MyClass Properties of class i For:", x.i)
    print("MyClass Class method f Output is:", x.f())
    
  • It still needs to be used. You can take a look at the basic small project and deeply understand the advantages of this idea

exception handling

  • An exception will occur when Python cannot handle the program normally. We need to catch and handle it, otherwise the program will terminate execution
  • Common exception handling methods:
    • Use try/except: detect errors in the try statement block, so that the exception statement can catch exception information and handle it
    • You can also use the raise statement to trigger exceptions yourself
    • It can be a python standard exception, or it can be inherited and customized
  • see course , just go through it
  • In my Python interview (I) The inheritance of exceptions is also mentioned in
    # It's usually written like this
    try:
        fh = open("testfile", "w")
        fh.write("This is a test file for testing exceptions!!")
    except IOError:		# This is a standard exception
        print("Error: File not found or failed to read")
    else:
        print("Content written to file successfully")
        fh.close()
    # There is no message attribute in Python 3. You can directly str() or borrow exc of sys_ info
    try:
        a = 1/0
    except Exception as e:  # This e is the exception message
        exc_type, exc_value, exc_traceback = sys.exc_info()
        print(str(e))       # division by zero
        print(exc_value)    # division by zero
    finally:	# The last code will be executed whether or not an exception occurs
        print('over!')
    
    # Trigger exception
    def func(level):
        if level < 1:
            raise Exception("Invalid level!", level)	# Can be output, you can write it
            # After the exception is triggered, the following code will not be executed
    func()	# Exception: ('Invalid level!', 0)
    
    # Custom exception
    class Networkerror(RuntimeError):	# Inherit RuntimeError
        def __init__(self, arg):
            self.args = arg
            # self.message = arg
            
    try:
        raise Networkerror("Bad hostname")
    except Networkerror as e:	# This e is the exception message
        print(e.args)	# You can also output format (E) str (E)
    
  • Python 3 will use Traceback to track exception information by default, from bottom to top!

Custom package / module

  • Large programs need to customize many packages (modules), which not only enhance the readability of the code, but also facilitate maintenance

  • Use import to import built-in or custom modules, which is equivalent to include

    import my	# Import my Py mode 1
    from my import test	# Import a function test() 	 Mode 2
    
  • When importing, the system will search from the set path to view the paths included in the system:

    • '' indicates the current path
  • The current program will prevent repeated import of imported modules, that is, the modules are modified after import and cannot be re imported directly

    # Need to use reload module to reboot
    from importlib import reload	# imp deprecated
    reload(module_name)
    
    # view help
    help(reload)	# The module must have been successfully imported before.
    
  • Points for attention in multi module development

    • Different import methods determine whether the variable is global or local
    • As shown in the figure, the direction changes:
    • If you import a list, append it with the append() method, and the variable will not be redefined
    • If you directly let HADNLE_FLAG = xxx, which is equivalent to the newly defined variable and does not change the list value in common
    • Therefore, it can only be imported and used in the first way:
  • Note:

    • __ name__ attribute
    # If we want a program block in the module not to execute when the module is introduced, we can use__ name__ attribute
    # Make the program block execute only when the module itself is running
    
    # Filename: using_name.py
    if __name__ == '__main__':
       print('The program itself is running%s'%__name__)
    else:
       print('The first mock exam is from another module.%s'%__name__)
    
    $ python using_name.py
     The program itself is running__main__	# Display main module
    
    $ python
    >>> import using_name
     The first mock exam is from another module. using_name	# Display file name
    

encapsulation

  • Encapsulation, inheritance and polymorphism are the three characteristics of object-oriented (class)
  • As shown in the figure__ class__ Attribute is equivalent to the function that the subclass can call the parent class (analogy C + +)

polymorphic

  • Analogy C++

    • Virtual function Rewriting: virtual void func(int a) {}, which is characterized by not compiling first and not sure which class called it
    • When the passed in function pointer refers to the subclass of the parent object (the subclass of the function that has been overridden), you can use the passed in function pointer (the subclass of the function that has been overridden) to determine the subclass of the function that has been overridden in the global function table
    • See my C++ note
  • It's similar in Python. Let's see an example

    # Inheritance in python is written in parentheses
    class MiniOS(object):	# Base class for all classes
        """MiniOS Operating system class """
        def __init__(self, name):	# Constructor
            self.name = name
            self.apps = []  # List of installed application names 	 list()
    
        def __str__(self):	# __ xxx__ It's called magic method
            """Returns the description of an object, print Object"""
            return "%s The list of installed software is %s" % (self.name, str(self.apps))
    
        def install_app(self, app):	# Pass in parent class pointer
            # Determine whether the software has been installed
            if app.name in self.apps:
                print("Already installed %s,No need to install again" % app.name)
            else:
                app.install()
                self.apps.append(app.name)
    
    
    class App(object):
        def __init__(self, name, version, desc):
            self.name = name
            self.version = version
            self.desc = desc
    
        def __str__(self):
            return "%s The current version of is %s - %s" % (self.name, self.version, self.desc)
    
        def install(self):	# Equivalent to virtual function
            print("take %s [%s] Copy the executed program to the program directory..." % (self.name, self.version))
    
    
    class PyCharm(App):	# Subclass inherits App
        pass
    
    # The same scope is called overloading
    class Chrome(App):
        def install(self):	# Equivalent to virtual function rewriting; 	 Ordinary functions in a class are called redefinition
            print("Extracting setup...")
            super().install()	# You need to call it through super instead of using it directly
    
    
    linux = MiniOS("Linux")
    print(linux)
    
    pycharm = PyCharm("PyCharm", "1.0", "python Developed IDE environment")
    chrome = Chrome("Chrome", "2.0", "Google browser")	# Incoming subclass object
    
    linux.install_app(pycharm)	# Equivalent to a global function, which subclass is passed in to execute the virtual method of which subclass
    linux.install_app(chrome)
    linux.install_app(chrome)
    
    print(linux)	# The list of software installed on Linux is ['pycharm ',' chrome ']
    
  • Experience it carefully!

thread

  • Using threading to create child threads
    import threading
    import time
    
    def func1(num1):
        for i in range(18):
            print(num1)
            time.sleep(0.1)
    
    def func3(str):
        for i in range(18):
            print(str)
            time.sleep(0.1)
    
    def func2():
        for i in range(20):
            print('Main thread',i)
            time.sleep(0.1)
    
    if __name__ == '__main__':
        thread = threading.Thread(target=func1, args=(555,))    # List parameters
        thread2 = threading.Thread(target=func3, kwargs={'str':'roy'})	# Keyword parameters
        thread.start()
        thread2.start()
        func2()     # The main thread is usually placed later, otherwise the main process will be executed first
    
  • The main thread usually waits for the child thread to finish before exiting
  • You can set the daemon thread to exit all the threads when the main thread ends
    if __name__ == '__main__':
        thread = threading.Thread(target=func1, args=(555,))    # Yuanzu formal biography parameter
        thread2 = threading.Thread(target=func3, kwargs={'str':'roy'})# Dictionary form
        # Daemon
        thread.setDaemon(True)  
        thread.start()
        # The daemon thread must be set to exit at the end of the main thread
        thread2.setDaemon(True)
        thread2.start()
        func2()
    
  • Mutex -- thread synchronization
    # If multiple threads operate global variables at the same time, there will be problems and need to be locked
    lock = threading.Lock()     # mutex 
    arr = 0
    
    def lockfunc1():
        lock.acquire()	# lock up
        global arr		# You need to get the global variable arr
        for i in range(500):
            arr += 1
        print('Process 1:',arr)
        lock.release()	# Release lock
    
    def lockfunc2():
        lock.acquire()
        global arr
        for i in range(400):
            arr += 1
        print('Process 2:',arr)
        lock.release()
    

process

  • Every time a process is created, the operating system will allocate running resources. What really works is threads. Each process will create a thread by default
  • Multi process can be multiple CPU cores, but generally refers to single core CPU concurrency, while multithreading is resource scheduling in one core, which can be understood in combination with the concept of parallel concurrency
    import multiprocessing
    
    def func1(num1):
        for i in range(18):
            print(num1)
            time.sleep(0.1)
    
    def func2(str):
        for i in range(18):
            print(str)
            time.sleep(0.1)
            
    if __name__ == '__main__':
    	multi1 = multiprocessing.Process(target=func1)
        multi2 = multiprocessing.Process(target=func2)# Each process has its own thread
        multi1.start()
        multi2.start()
    
  • Similarly, you can use a daemon to exit a child process
    • The daemon is a special background process. If the child process starts with the daemon, it will act according to the eyes of the main process. Do you understand!
    • Generally, the daemon will be created when the system is turned on and will exit when it is turned off. It will monitor silently
  • You can also use precess Terminate
  • Processes are independent (the basic unit of resource allocation) and do not share global variables
  • So how do processes communicate? Message queue
    queue = multiprocessing.Queue(3)    # Any number of data can be saved by default
    
    • Of course, there are shared memory, pipes, etc. you can see my operating system note
    • Personally, I think this little brother summed up very well, recommend!

Multitasking

  • Understand the process and thread, but there is another feature in python: CO process
  • The interview may ask three questions: what is an iterator? What is a generator? What is a collaborative process?

iterator

  • Python three gentlemen: iterator, generator, decorator

  • Iterators are generally used for iteratable objects, including lists, dictionaries, primitives and collections. They are generally used in for loops

    # Determine whether it can be iterated
    from collections import Iterable
    # Use function: as long as you can iterate, you can trace the source to Iterable
    isinstance([], Iterable)	# isinstance: is it an example, that is, the previous genus does not belong to the latter
    isinstance(a,A)		# Is object a an instance of class a
    
    • First, judge whether the object can be iterated: see if there is__ iter__ method
    • Then call the iter() function: automatically invoke the above magic method and return the iterator (object).
    • By calling the next() function, you can continuously call the of the iteratable object__ next__ Method to get the next iteration value of the object
  • Iterators are also used in classes. How to make a class an iteratable object?

    • Implement directly in class__ iter__ And__ next__ method
    • Call the next method through the for loop
  • You can also return the class itself as an iterator

    • The above custom class Classmate is implemented first__ iter__ Method to make it an iteratable object and return the ClassIterator iterator
    • Iterators must also be iteratable objects, so they must also be implemented__ iter__ method
    • Similarly, when the class is instantiated, the ret value can be obtained by calling the next method.
    • It also shows that iterators must be iteratable, and iterators are not necessarily iteratable (depending on whether there is a _ next _)
  • The following example also illustrates the principle of iterators:

    class MyIterator:
        def __iter__(self): # Return iterator object (initialization)
            self.a = 1		# Mark iteration position
            return self
    
        def __next__(self): # Return to next object
            x = self.a
            self.a += 1	# As you can see, here is the way to get the next value
            return x
    
    myclass = MyIterator()
    myiter = iter(myclass)	# Get iterator
    
    print(next(myiter))	# 1
    print(next(myiter)) # 2
    print(next(myiter)) # 3
    print(next(myiter)) # 4
    print(next(myiter)) # 5
    
    • Here, the iter() method is also called, which is related to the definition, because it needs to initialize self a. So it's not necessary
    • The point is: the iterator returns the way to get the data, which can save memory
  • For example: the difference between range() and = = xrange() = =

    # In python2
    range(100) # Return the list from 0 to 99, which takes up more memory
    xrange(100)	# Returns how data is generated
    
    # In py3, range() is equivalent to xrange()
    range()
    
  • In practice, accept iterators thoroughly:

    • The Fibonacci sequence is obtained in the form of iterators
      • You don't know, do you? 0, 1, 1, 2, 3, 5, 8, 13, 21, 34... Fibonacci sequence
    • The key points are: call the next method by default in the form of a for loop; This is also why it is used in the for loop
    • There is also the writing method of swap in Python. It is very common. Remember!
  • Of course, not only the for loop can receive iterative objects, but also the essence of type conversion is an iterator

    li = list(FibIterator(15))
    print(li)
    tp = tuple(FibIterator(6))
    print(tp)
    
  • What is an iterator?

    • The iterator supports the next() method to get the next value of the iteratable object
    • Generally used in for loops and classes, the essence is to implement iter and next magic methods
    • The iterator saves the memory space occupied by the data when the program is running by giving the way of data generation

generator

  • When implementing an iterator, we need to return manually and implement the next method to generate the next data

  • A simpler generator syntax can be adopted, that is, a generator is a special kind of iterator

  • Mode 1:

    L = [ x*2 for x in range(5)]	# [0, 2, 4, 6, 8]
    G = ( x*2 for x in range(5))	# <generator object <genexpr> at ...
    # The only difference is that the () of the outer layer can be used according to the use of iterators
    next(G)	# 0 	 G is now an iterator
    next(G)	# 1
    next(G)	# 2
    
    • As long as it is an iterator, it can be started with next()
  • Mode 2:

    • Use the yield keyword to create a generator, which is characterized by returning the following value when it is executed to yield
    • The next iteration can then be executed, that is, special process control
    • Let's take an example: we still get the Fibonacci sequence
    def create_num(all_num):
        print('------1------')
        a, b = 0, 1
        cur = 0
        while(cur<all_num):
            print('------2------')
            yield a # Return to a
            print('Return and execute')
            a, b = b, a+b
            cur += 1
    
    if __name__ == '__main__':
        obj = create_num(5)
        for num in obj:
            print(num)
    
    • It can be found that compared with the direct implementation with iterators, return and__ next__ method
    • yield directly returns the following data, and can come back after this cycle, and then execute downward
  • Process control of data processing commonly used in crawlers:

    • As shown in the figure, after using the css selector to obtain all web page links (urls), you need to initiate a request to crawl the source code for each link
    • Use yield to return the execution of the Request to the object that calls the parse function at this point. This object loops (next), and yield returns to continue the loop
    • Of course, the crawler framework can implement this behavior asynchronously without blocking and waiting (or it returns a function that you can handle there)
    • In deep learning training, we need to feed data in batches, so we can use yield; Get a batch of data each time and return it to the model. The model calls the next() method / loop to get the next batch of data
  • Use send() instead of next() to wake up. The difference is that parameters can be passed

  • Now, I can finally return to the headline: multitasking

    • The generator implements alternating tasks, that is, simple coprocesses
  • What is a generator?

    • Generators are a special class of iterators
    • Generally, yield is used to create a generator. The function containing yield keyword is called generator function
    • The feature is that when it is executed to yield, it returns the following value, which can be followed by the function
    • Because it can be executed after returning, it is a special process control, which is generally used in crawler and deep learning training
    • Using yield between multiple functions is equivalent to switching between functions, which is also the basic principle of coprocessing
  • Note: iterations of iterators and generators can only go back, not forward

Synergetic process

  • Implement simple co process code

    import time
    
    def work1():
        while True:
            print("----work1---")
            yield
            time.sleep(0.5)
    
    def work2():
        while True:
            print("----work2---")
            yield
            time.sleep(0.5)
    
    def main():
        w1 = work1()
        w2 = work2()
        while True:
            next(w1)
            next(w2)
    
    if __name__ == "__main__":
        main()
    
  • What is a coroutine: coroutine is another way to implement multitasking in python, but it takes up less execution units than threads

  • It has its own CPU context, so that we can switch one collaboration process to another at the right time;

  • Difference from thread:

    • When implementing multitasking, the operating system has its own Cache and other data for the efficiency of program operation
    • The operating system will also help you recover these data, so thread switching consumes more performance
    • However, coprocess switching is only the context of operating the CPU, so it can switch systems millions of times a second
      • CPU context is the CPU register and program counter PC, which is the necessary dependent environment before running any task
      • That is: the cooperative process loads lightly and throws away the redundant state, which is equivalent to only switching between functions in the program, with its own logic, taking the data from elsewhere, returning the result after execution, and ending!
  • In order to better use the collaborative process to complete multitasking, the Green let module in python encapsulates it

    # sudo pip3 install greenlet	# Install Python 2
    
    from greenlet import greenlet
    import time
    
    def test1():
        while True:
            print "---A--"
            gr2.switch()
            time.sleep(0.5)
    
    def test2():
        while True:
            print "---B--"
            gr1.switch()
            time.sleep(0.5)
    
    gr1 = greenlet(test1)
    gr2 = greenlet(test2)
    
    #Switch to run in gr1
    gr1.switch()	# This function encapsulates yield
    
    # In fact, this is a fake multitasking, completely alternating execution
    
  • More commonly used is gevent

    • Install PIP3 -- default timeout = 100 install gevent http://pypi.douban.com/simple/ --trusted-host pypi. douban. com
    • Or use http://mirrors.aliyun.com/pypi/simple/
    • However, if an error is reported, use sudo pip3 install gevent. Slow down
    • You can use pip3 list to view installed libraries
    # Take the greenlet and further encapsulate it
    import gevent
    
    def f(n):
        for i in range(n):
            print(gevent.getcurrent(), i)
    
    g1 = gevent.spawn(f, 5)
    g2 = gevent.spawn(f, 5)
    g3 = gevent.spawn(f, 5)
    g1.join()
    g2.join()
    g3.join()
    # Running discovery is running in sequence
    
  • Multitasking: in a single core, tasks are executed concurrently and alternately

    import gevent
    
    def f1(n):
        for i in range(n):
            print(gevent.getcurrent(), i)
            #It is used to simulate a time-consuming operation. Note that it is not sleep in the time module
            gevent.sleep(1)	# All modules in gevent should be used
            
    def f2(n):
        for i in range(n):
            print(gevent.getcurrent(), i)
            gevent.sleep(1)	# Switch in case of time-consuming operation
            
    def f3(n):
        for i in range(n):
            print(gevent.getcurrent(), i)
            gevent.sleep(1)
    
    g1 = gevent.spawn(f1, 5)	# Create a collaboration
    g2 = gevent.spawn(f2, 5)	# Objective function, parameter
    g3 = gevent.spawn(f3, 5)
    g1.join()	# join blocks time-consuming
    g2.join()	# Join and execute
    g3.join()	# Will wait for all functions to execute
    # It is equivalent to switching between functions, that is, the so-called built-in CPU context, which saves resources
    
  • Threads depend on processes, and coroutines depend on threads; Minimum synergy

    from gevent import monkey	# Patch, automatically convert time to gevent
    import gevent
    import random
    import time
    
    def coroutine_work(coroutine_name):
        for i in range(10):
            print(coroutine_name, i)
            time.sleep(random.random())
    
    gevent.joinall([
            gevent.spawn(coroutine_work, "work1"),
            gevent.spawn(coroutine_work, "work2")
    ])
    
  • What is a collaborative process?

    • A coroutine is equivalent to a microthread
    • GIL locks and inter thread switching consume more resources
    • The coroutine has its own CPU context, which can rely on one thread to realize the switching between coroutines, which is faster and less expensive
    • Equivalent to switching between program functions

Concurrent Downloader

  • Using coprocess to realize a picture downloader
    from gevent import monkey	# That is, runtime replacement, the embodiment of python dynamics!
    import gevent
    import urllib.request
    import random
    
    # Required when there are time-consuming operations
    monkey.patch_all()
    
    def my_downLoad(url):
        print('GET: %s' % url)
        resp = urllib.request.urlopen(url)
        # file_name = random.randint(0,100)
        data = resp.read()
        with open(file_name, "wb") as f:
            f.write(data)
    
    def main():
        gevent.joinall([
            gevent.spawn(my_downLoad, "1.jpg", 'https://rpic.douyucdn.cn/live-cover/appCovers/2021/01/10/9315811_20210110043221_small.jpg'),
            gevent.spawn(my_downLoad, "2.jpg", 'https://rpic.douyucdn.cn/live-cover/appCovers/2021/01/04/9361042_20210104170409_small.jpg'),
            gevent.spawn(my_downLoad, "3.jpg", 'https://rpic.douyucdn.cn/live-cover/roomCover/2020/12/01/5437366001ecb82edfe1e098d28ebc36_big.png'),
    	])
        
    if __name__ == "__main__":
        main()
    

summary

  • Process is the unit of resource allocation. Processes are independent, so they are stable, but switching requires the most resources and is inefficient
  • Thread is the basic unit of CPU scheduling. The amount of resources required for thread switching is general and the efficiency is general (without considering GIL)
  • Coprocess switching consumes very little resources and is highly efficient. It is used when there are more network requests (more blocking). It can be understood as thread enhancement
  • Multi process and multi thread may be parallel according to the different number of CPU cores, that is, each thread of a process can use multi-core parallelism; But the coroutine is in a thread, so it is concurrent
  • The next article introduces the advanced operations of python

Topics: Python Multithreading OOP Polymorphism multiple processes