problem
You have a data sequence and want to use some rules to extract the required values or shorten the sequence
Solution
The simplest way to filter sequence elements is to use list derivation. For example:
mylist = [1, 4, -5, 10, -7, 2, 3, -1] print([n for n in mylist if n > 0]) # ->[1, 4, 10, 2, 3] print([n for n in mylist if n < 0]) # ->[-5, -7, -1]
A potential drawback of using list derivation is that if the input is very large, it will produce a very large result set and occupy a lot of memory. If you are memory sensitive, you can use the generator expression iteration to generate filtered elements. For example:
pos = (n for n in mylist if n > 0) print(pos) # -><generator object <genexpr> at 0x0000027428014AC0> for x in pos: print(x) ''' 1 4 10 2 3 '''
Sometimes, filtering rules are complex and cannot be simply expressed in list derivation or generator expression. For example, suppose you need to deal with some exceptions or other complex situations during filtering. At this time, you can put the filter code into a function, and then use the recommended filter () function. Examples are as follows:
values=['1','2','-3','4','N/A','5'] def is_int(val): try: x=int(val) return True except ValueError: return False ivals=list(filter(is_int,values)) print(ivals) # ['1', '2', '-3', '4', '5']
The filter () function creates an iterator, so if you want to get a list, you have to use list () to convert it like the example,
discuss
List derivations and generator expressions are usually the simplest way to get data. In fact, they can also convert data when filtering. For example:
mylist = [1, 4, -5, 10, -7, 2, 3, -1] import math print([math.sqrt(n) for n in mylist if n > 0]) # [1.0, 2.0, 3.1622776601683795, 1.4142135623730951, 1.7320508075688772]
A variable of filtering operation is to replace unqualified values with new values instead of discarding them. For example, in a column of data, you may not only want to find positive numbers, but also want to replace non positive numbers with specified numbers. This problem can be easily solved by putting the filter condition into the condition expression, as follows:
clip_neg=[n if n>0 else 0 for n in mylist] print(clip_neg) # ->[1, 4, 0, 10, 0, 2, 3, 0] clip_pos=[n if n<0 else 0 for n in mylist] print(clip_pos) # ->[0, 0, -5, 0, -7, 0, 0, -1]
Another noteworthy filtering tool is itertools Compress (), which takes an iterable object and a corresponding Boolean selector sequence as input parameters. Then output the element in the iterable object whose corresponding selector is True. This function is very useful when you need to filter a sequence with two other associated sequences. For example, suppose you have the following two columns of data:
addresses = ['5412 N CLARK', '5148 N CLARK', '5800 E 58TH', '2122 N CLARK', '5645 N RAVENSWOOD', '1060 W ADDISON', '4801 N BROADWAY', '1039 W GRANVILLE'] counts = [0, 3, 10, 4, 1, 7, 6, 1]
Now you want to output all the addresses whose corresponding count value is greater than 5. You can do this:
from itertools import compress more5 = [n > 5 for n in counts] print(more5) # [False, False, True, False, False, True, True, False] print(list(compress(addresses, more5))) # ->['5800 E 58TH', '1060 W ADDISON', '4801 N BROADWAY']
The key point here is to create a Boolean sequence to indicate which elements meet the conditions. Then the compress () function selects the element whose output position is True according to this sequence.
Similar to the filter () function, compress () is also returned as an iterator. Therefore, if you need to get a list, you need to use list () to convert the result to list type.