Python productivity tips

Posted by nloding on Tue, 21 Dec 2021 10:13:48 +0100

preface:

this article assumes that the reader already has a certain foundation of Python and has read several introductory books. This article is not a quick reference manual (you can quickly query a function under a module). It aims to focus on several of the most important topics and demonstrate several possible efficient solutions as a record of their own improvement.

1, Built in module

1. deque module

return a new deque object initialized from left to right (using append()), in which the data is from iterable. If iterable is not specified, the new double ended queue is empty.
double ended queue is a generalization of stack and queue (the name is pronounced "deck", which is the abbreviation of "double ended queue"). Double ended queue supports thread safe, memory efficient addition and pop-up from either side of double ended queue. The O(1) performance in either direction is roughly the same.
although list objects support similar operations, they are optimized for fast fixed length operations and incur O (n) memory mobility costs, pop(0) and operations, the changed size of these operations and the location of the underlying data representation. insert(0, v)
if maxlen or None is not specified, the double ended queue may grow to any length. Otherwise, the double ended queue is limited to the specified maximum length. Once the double ended queue with bounded length is full, when new items are added, the corresponding number of items are discarded from the other end. Two ended queues of bounded length provide functions similar to filters in tailUnix. They can also be used to track transactions and other data pools that are only interested in recent activities.
application: keep the last N elements
because it has the maxlen parameter, when the maximum is specified, the previous ones will be eliminated and only the last N will be retained

from collections import deque
items = deque([0, 1, 2, 3, 4, 5, 6])
items
>>>deque([0, 1, 2, 3, 4, 5, 6])
items.append(7)# This method is equivalent to the following
items.appendright(7)
>>>deque([0, 1, 2, 3, 4, 5, 6, 7])
items.appendleft(7)
>>>deque([7, 0, 1, 2, 3, 4, 5, 6, 7])
# Keep the latest N records
A = deque([1, 2, 3], maxlen=3)
A
>>>deque([1, 2, 3])
A.append(4)
A
>>>deque([2, 3, 4])
"""
its pop The method can also be used left and right.
It is much more flexible and convenient than the traditional list, but it needs to realize random access,This is not recommended,Please use list.
"""

2. heapq module

How to achieve the maximum / minimum N in a set?
The heapq module has two functions: nlargest() and nsmallest(), which can perfectly solve this problem.

nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
print(heapq.nlargest(3, nums))
>>>[42, 37, 23]
print(heapq.nsmallest(3, nums))
>>>[-4, 1, 2]

Both functions can accept a keyword parameter for more complex data structures:

portfolio = [ {'name': 'IBM', 'shares': 100, 'price': 91.1},
{'name': 'AAPL', 'shares': 50, 'price': 543.22},
{'name': 'FB', 'shares': 200, 'price': 21.09},
{'name': 'HPQ', 'shares': 35, 'price': 31.75},
{'name': 'YHOO', 'shares': 45, 'price': 16.35},
{'name': 'ACME', 'shares': 75, 'price': 115.65} ]
cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
cheap
>>>[{'name': 'YHOO', 'shares': 45, 'price': 16.35},
 {'name': 'FB', 'shares': 200, 'price': 21.09},
 {'name': 'HPQ', 'shares': 35, 'price': 31.75}]
 
expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])
expensive
>>>[{'name': 'AAPL', 'shares': 50, 'price': 543.22},
 {'name': 'ACME', 'shares': 75, 'price': 115.65},
 {'name': 'IBM', 'shares': 100, 'price': 91.1}]

note: when the number of elements to be searched is relatively small, the functions nlargest() and nsmallest() are very important
appropriate. If you just want to find the only smallest or largest (N=1) element, use min() and
The max() function will be faster. Similarly, if the size of N is close to the set size, it is usually sorted first
A collection and then slicing will be faster (sorted(items)[:N] or sorted(items)[-
N:] )

3. On the use of zip function

how to perform some calculation operations (such as minimum value, maximum value, sorting, etc.) in the data dictionary?
in order to calculate the dictionary value, you usually need to use the zip() function to reverse the key and value first. than
For example, the following is the code to find the minimum and maximum stock prices and stock values:

prices = { 
'ACME': 45.23, 'AAPL': 612.78, 'IBM': 205.55, 'HPQ': 37.20, 'FB': 10.75
}
min_price = min(zip(prices.values(), prices.keys()))
min_price 
>>>(10.75, 'FB')
Similarly, there are the following methods:
>>> prices = { 'AAA' : 45.23, 'ZZZ': 45.23 }
>>> sorted(zip(prices.values(), prices.keys()))
[(45.23, 'AAA'), (45.23, 'ZZZ')]

4. Named slice

The built-in slice() function creates a slice object that can be used where any slice is allowed.
For example:

>>> items = [0, 1, 2, 3, 4, 5, 6]
>>> a = slice(2, 4)
>>> items[2:4]
[2, 3]
>>> items[a]
[2, 3]
>>> items[a] = [10,11]
>>> items
[0, 1, 10, 11, 4, 5, 6]
>>> del items[a]
>>> items
[0, 1, 4, 5, 6]

From the results, the results of the above two methods are consistent, but the latter avoids a large number of incomprehensible hard coded subscripts, making your code more clear and readable.
Parameters: s.start, s.stop, s.step

5. Counter module in collections

How to find the most frequent elements in a sequence?
collections. The counter class is specially designed for this kind of problem, and it even has a useful
The most common() method gives you the answer directly.

words = [ 'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes', 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the', 'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into', 'my', 'eyes', "you're", 'under' ]
word_counts = Counter(words)
# Three words with the highest frequency
top_three = word_counts.most_common(3)
print(top_three)
>>>[('eyes', 8), ('the', 5), ('look', 4)]

The return result of Counter is in the form of key value pairs.

2, String / text processing

1. Split string

Requirement: you need to split a string into multiple fields, but the separator (and the surrounding spaces) is not fixed.

The split () method of string object is only suitable for very simple string segmentation. It does not allow multiple separators or uncertain spaces around separators. When you need more flexibility to cut strings, it's best to use re split() method

line = 'asdf fjdk; afed, fjek,asdf, foo'
"""
Separators can be commas, semicolons, or spaces(\s:Any invisible spaces, tab/Line feed/Page change, etc)，And followed by any space. As long as the pattern is found, the entities on both sides of the matching separator will be returned as elements in the result.
"""
re.split(r'[;,\s]\s*', line)
Equivalent to:
re.split(r'(?:;|,|\s)\s*', line)
>>>['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
# It should be i noted that if there are groups in the following format, the matched text will also appear in the result list
fields = re.split(r'(;|,|\s)\s*', line)
fields
>>>['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo']
# Conversion idea: make sure your group is a non capture group, such as (?:...)
re.split(r'(?:,|;|\s)\s*', line)
>>>['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

2. Replace string

Requirement: suppose you want to change the date string in the form of 11 / 27 / 201 to 2012-11-27

For simple literal patterns, you can directly use the str.reply () method, but for requirements like the above, simple patterns cannot be handled

>>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
>>> import re
>>> re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text)
'Today is 2012-11-27. PyCon starts 2013-3-13.'

The first parameter in the sub() function is the matched pattern, the second parameter is the replacement pattern, and the third is text / string. The backslash number, such as \ 3, points to the capture group number of the preceding pattern.

If you want to know how many substitutions have occurred in addition to the results after substitution, you can use re subn()
Instead. For example:

>>> newtext, n = datepat.subn(r'\3-\1-\2', text)
>>> newtext
'Today is 2012-11-27. PyCon starts 2013-3-13.'
>>> n 
2

For general matching operations that ignore case, simply pass a re The ignorecase flag parameter is sufficient. This parameter applies to all re methods

3. Newline matching

Requirement: you are trying to use regular expressions to match a large piece of text, and you need to match across multiple lines

comment = re.compile(r'/\*(.*?)\*/')
text1 = '/* this is a comment */'
text2 = '''/* this is a
... multiline comment */
... '''
comment.findall(text1)
>>>[' this is a comment ']
comment.findall(text2)
>>>[]
In order to solve the above problems, we must adapt to the previously mentioned methods,
Method 1:
comment2 = re.compile(r'/\*((?:.|\n)*?)\*/')
comment2.findall(text2)
>>>[' this is a\nmultiline comment ']
Method 2: directly use the flag bit
comment = re.compile(r'/\*(.*?)\*/', re.DOTALL)
comment.findall(text2)
>>>[' this is a\nmultiline comment ']

It should be noted that these two matches will match the newline character,It can be used in practical application re.sub()Replace

4. String alignment

Requirement: you want to format strings with some sort of alignment

For basic string alignment operations, you can use the ljust (), rjust () and center() methods of strings.

s = "hello world"
s.ljust(20,"*")
>>>'hello world*********'
s.rjust(20,"*")
>>>'*********hello world'
s.center(20,"*")
>>>'****hello world*****'
"""
To achieve the above operation, there are format()Function, and, format()The function is more general than the above method because it can format not only characters, but also things in other formats. use format When you want to use <,> perhapsˆ The character is followed by a specified width
"""
format(s,">20")
>>>'         hello world'
format(s,"^20")
>>>'    hello world     '
# Of course, you can also specify the fill symbol as follows
format(s,"*<20")
>>>'hello world*********'
format(s,"*^20")
>>>'****hello world*****'
# Its power over ljust()
format(1.23456,"*^20")
>>>'******1.23456*******'
format(1.23456,"*^20.2")
>>>'********1.2*********'
format((1.23456*100),"^.2f")
>>>'123.46'

5. String splicing

Requirement: you want to splice multiple strings
When you simply put them together, use the plus sign (+). When you need other processing, use join() method.

a = 'Is Chicago'
b = 'Not Chicago?'
a + ' ' + b
>>>'Is Chicago Not Chicago?'
Equivalent to:
" ".join([a,b])
>>>'Is Chicago Not Chicago?'

if the two strings are small, the performance of the first version will be better because join() calls I/O, and I/O system calls are inherently slow. On the other hand, if the two strings are large, the second version may be more efficient because it avoids creating a large temporary result and copying a large amount of memory block data

That's all for this section. The next section introduces the operation of digital date and time

Topics: Python

Programmer Think