Python crawler and python teaching video of data analysis, Python source code sharing, Python
Python crawler and basic data analysis tutorial: Python grammar, dictionary, tuples, lists
Advanced tutorials on Python crawlers and data analysis: file manipulation, lambda expressions, recursion, yield generator
Python crawler and data analysis module: built-in module, open source module, custom module
Python crawler and data analysis crawler skills: urlib library, xpath selector, regular expression
Python Reptiles and Data Analysis of Jingdong Reptiles: Climbing Jingdong Commodities and Storing them in sqlite3 Database
Python Crawler and Data Analysis Python Open Source Crawler Project Summary
python commonly used built-in functions:
File operation
When operating files, the following steps are generally required:
- Open file
- Operation file
- Close file
I. Opening Documents
1 |
File handle = file('file path','mode') |
Note: There are two ways to open a file in python: open(...) and file(...). In essence, the former calls the latter internally for file operation, and it is recommended to use open.
When opening a file, you need to specify the file path and how to open the file. After opening, you can get the file handle, and then operate on the file through the file handle.
The mode of opening the file is:
- r, read-only mode (default).
- w, write-only mode. [Unreadable; Create if nonexistent; Delete if present;]
- a, append mode. [Readable; Create if nonexistent; append only if existing;]
"+" means that a file can be read and written at the same time
- r+, readable and writable files. [Readable; Writable; Additive]
- w+, write and read
- A+, same as a
"U" means that r n r r n r n can be automatically converted to n (used in conjunction with R or R + mode) when read.
- rU
- r+U
"b" means processing binary files (e.g. FTP sending and uploading ISO image files, linux negligible, and windows annotating binary files)
- rb
- wb
- ab
II. Operational Functions
1 class file(object): 2 3 def close(self): # real signature unknown; restored from __doc__ 4 Close file 5 """ 6 close() -> None or (perhaps) an integer. Close the file. 7 8 """ 9 10 def fileno(self): # real signature unknown; restored from __doc__ 11 File descriptor 12 """ 13 fileno() -> integer "file descriptor". 14 15 This is needed for lower-level file interfaces, such os.read(). 16 """ 17 return 0 18 19 def flush(self): # real signature unknown; restored from __doc__ 20 Refresh File Internal Buffer 21 """ flush() -> None. Flush the internal I/O buffer. """ 22 pass 23 24 25 def isatty(self): # real signature unknown; restored from __doc__ 26 Determine whether the document agrees? tty equipment 27 """ isatty() -> true or false. True if the file is connected to a tty device. """ 28 return False 29 30 31 def next(self): # real signature unknown; restored from __doc__ 32 Get the next row of data. If it does not exist, it will report an error. 33 """ x.next() -> the next value, or raise StopIteration """ 34 pass 35 36 def read(self, size=None): # real signature unknown; restored from __doc__ 37 Read specified byte data 38 """ 39 read([size]) -> read at most size bytes, returned as a string. 40 41 """ 42 pass 43 44 def readinto(self): # real signature unknown; restored from __doc__ 45 Read to buffer, do not use, will be abandoned 46 """ readinto() -> Undocumented. Don't use this; it may go away. """ 47 pass 48 49 def readline(self, size=None): # real signature unknown; restored from __doc__ 50 Read only one row of data 51 """ 52 readline([size]) -> next line from the file, as a string. 53 """ 54 pass 55 56 def readlines(self, size=None): # real signature unknown; restored from __doc__ 57 Read all the data and save the list of values according to the newline 58 """ 59 readlines([size]) -> list of strings, each a line from the file. 60 """ 61 return [] 62 63 def seek(self, offset, whence=None): # real signature unknown; restored from __doc__ 64 Specify the pointer position in the file 65 """ 66 seek(offset[, whence]) -> None. Move to new file position. 67 """ 68 pass 69 70 def tell(self): # real signature unknown; restored from __doc__ 71 Get the current pointer position 72 """ tell() -> current file position, an integer (may be a long integer). """ 73 pass 74 75 def truncate(self, size=None): # real signature unknown; restored from __doc__ 76 Truncate data, retaining only data before specifying 77 """ 78 pass 79 80 def write(self, p_str): # real signature unknown; restored from __doc__ 81 Writing content 82 """ 83 write(str) -> None. Write string str to file. 84 """ 85 pass 86 87 def writelines(self, sequence_of_strings): # real signature unknown; restored from __doc__ 88 Write a list of strings to a file 89 """ 90 writelines(sequence_of_strings) -> None. Write the strings to the file. 91 """ 92 pass 93 94 def xreadlines(self): # real signature unknown; restored from __doc__ 95 Can be used to read files line by line, not all 96 """ 97 xreadlines() -> returns self. 98 """ 99 pass
Three, with
To avoid forgetting to close after opening a file, you can manage the context by:
1 2 3 |
with open('log','r') as f: ... |
In this way, when the with code block is executed, the internal file resources are automatically closed and released.
After Python 2.7, with also supports managing the context of multiple files at the same time, that is:
1 2 |
with open('log1') as obj1, open('log2') as obj2: pass |
4. Examples of python file operation
Custom function
Background
Before learning functions, we always follow the following principles: process-oriented programming, i.e. implementing functions from top to bottom according to business logic, which often uses a long piece of code to achieve the specified functions. The most common operation in the development process is paste copy, that is to say, copying blocks of code previously implemented to existing functions, as follows
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
while True: if cpu utilization > 90%: # Send email reminders Connect to Mailbox Server Send mail Close the connection
if hard disk usage space > 90%: # Send email reminders Connect to Mailbox Server Send mail Close the connection
if memory occupancy > 80%: # Send email reminders Connect to Mailbox Server Send mail Close the connection |
Looking at the above code, the content under the if conditional statement can be extracted for public use, as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
def Send mail (content) # Send email reminders Connect to Mailbox Server Send mail Close the connection
while True:
if cpu utilization > 90%: Send mail ('CPU Alarm')
if hard disk usage space > 90%: Send mail ('Hard Disk Alarm')
if memory occupancy > 80%: |
For the above two implementations, the second must be better than the first in reusability and readability. In fact, this is the difference between functional programming and process-oriented programming.
- Functional Formula: Encapsulate a function code into a function, and then it will not need to be repeated in the future, just call the function.
- Object-Oriented: Classifying and encapsulating functions to make development "faster, better and stronger..."
The most important thing in functional programming is to enhance code reusability and readability
2. Definition and Use of Functions
1 2 3 4 5 |
def: Function name (parameter):
... Function body ... |
The definition of function mainly includes the following points:
- def: Keyword for function
- Function Name: The name of the function, which is then called according to the function name.
- Function body: A series of logical calculations are carried out in the function, such as sending mail, calculating the maximum number in [11, 22, 38, 888, 2], etc.
- Parameters: Provide data for function bodies
- Return value: When the function has been executed, it can return data to the caller.
Among the above points, parameters and return values are more important:
1. Return value
Function is a function block. Whether the function is successfully executed or not requires a return value to inform the caller.
2, parameters
The function has three different parameters:
- General parameter
- Default parameters
- dynamic parameter
View Code
lambda expressions
When learning conditional operations, for simple if else statements, ternary operations can be used to represent them, namely:
1 2 3 4 5 6 7 8 |
# Common conditional statement if 1 == 1: name = 'wupeiqi' else: name = 'alex'
# ternary operation name = 'wupeiqi' if 1 == 1 else 'alex' |
For simple functions, there is also a simple way of expression, that is, lambda expression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
####################### Ordinary function###################### # Define functions (in general) def func(arg): return arg + 1
# Execution function result = func(123)
# ###################### lambda ######################
# Definition function (lambda expression) my_lambda = lambda arg : arg + 1
# Execution function result = my_lambda(123) |
The meaning of lambda's existence is a concise representation of simple functions.
Built-in function 2
I. map
Traversing the sequence, each element in the sequence is manipulated, and finally a new sequence is obtained.
1 li = [11, 22, 33] 2 3 new_list = map(lambda a: a + 100, li) 4 5 6 li = [11, 22, 33] 7 sl = [1, 2, 3] 8 new_list = map(lambda a, b: a + b, li, sl)
Two, filter
The elements in the sequence are screened and the qualified sequence is finally obtained.
1 2 li = [11, 22, 33] 3 4 new_list = filter(lambda arg: arg > 22, li) 5 6 #filter The first parameter is empty, and the original sequence is obtained.
Three, reduce
Accumulate all elements in a sequence
li = [11, 22, 33] result = reduce(lambda arg1, arg2: arg1 + arg2, li) # reduce The first parameter, the function must have two parameters # reduce The second parameter, the sequence of loops # reduce The third parameter, the initial value
yield generator
1. Contrast the difference between range and xrange
1 2 3 4 |
>>> print range(10) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> print xrange(10) xrange(10) |
As shown in the code above, ranges create all the specified numbers in memory, while xrange s do not create immediately, creating each array only when iterating through the loop.
1 def nrange(num): 2 temp = -1 3 while True: 4 temp = temp + 1 5 if temp >= num: 6 return 7 else: 8 yield temp
2. The Difference between read and xreadlinex in File Operation
1 2 |
read reads everything into memory xreadlines are acquired only in circular iterations |
1 def NReadlines(): 2 with open('log','r') as f: 3 while True: 4 line = f.next() 5 if line: 6 yield line 7 else: 8 return 9 10 for i in NReadlines(): 11 print i 12 13 14 def NReadlines(): 15 with open('log','r') as f: 16 seek = 0 17 while True: 18 f.seek(seek) 19 data = f.readline() 20 if data: 21 seek = f.tell() 22 yield data 23 else: 24 return 25 26 for item in NReadlines(): 27 print item 28
Decorator
Decorator is a function, but it can have a special meaning. Decorator is used to decorate function or class. Decorator can add corresponding operations before and after function execution.
1 2 3 4 5 6 7 8 9 10 |
def wrapper(func): def result(): print 'before' func() print 'after' return result
@wrapper def foo(): print 'foo' |
1 import functools 2 3 4 def wrapper(func): 5 @functools.wraps(func) 6 def wrapper(): 7 print 'before' 8 func() 9 print 'after' 10 return wrapper 11 12 @wrapper 13 def foo(): 14 print 'foo' 15 16 17 #!/usr/bin/env python 18 #coding:utf-8 19 20 def Before(request,kargs): 21 print 'before' 22 23 def After(request,kargs): 24 print 'after' 25 26 27 def Filter(before_func,after_func): 28 def outer(main_func): 29 def wrapper(request,kargs): 30 31 before_result = before_func(request,kargs) 32 if(before_result != None): 33 return before_result; 34 35 main_result = main_func(request,kargs) 36 if(main_result != None): 37 return main_result; 38 39 after_result = after_func(request,kargs) 40 if(after_result != None): 41 return after_result; 42 43 return wrapper 44 return outer 45 46 @Filter(Before, After) 47 def Index(request,kargs): 48 print 'index' 49 50 51 if __name__ == '__main__': 52 Index(1,2) 53
Bubble algorithm
Requirements: Sort the list [13, 22, 6, 99, 11] from small to large
Idea: Compare the two adjacent values, place the larger values on the right side, and compare them in turn!
1 li = [13, 22, 6, 99, 11] 2 3 for m in range(4): # Equivalent to #for m in range(len(li)-1): 4 if li[m]> li[m+1]: 5 temp = li[m+1] 6 li[m+1] = li[m] 7 li[m] = temp 8 9 10 li = [13, 22, 6, 99, 11] 11 12 for m in range(4): # Equivalent to #for m in range(len(li)-1): 13 if li[m]> li[m+1]: 14 temp = li[m+1] 15 li[m+1] = li[m] 16 li[m] = temp 17 18 for m in range(3): # Equivalent to #for m in range(len(li)-2): 19 if li[m]> li[m+1]: 20 temp = li[m+1] 21 li[m+1] = li[m] 22 li[m] = temp 23 24 for m in range(2): # Equivalent to #for m in range(len(li)-3): 25 if li[m]> li[m+1]: 26 temp = li[m+1] 27 li[m+1] = li[m] 28 li[m] = temp 29 30 for m in range(1): # Equivalent to #for m in range(len(li)-4): 31 if li[m]> li[m+1]: 32 temp = li[m+1] 33 li[m+1] = li[m] 34 li[m] = temp 35 print li 36 37 38 li = [13, 22, 6, 99, 11] 39 40 for i in range(1,5): 41 for m in range(len(li)-i): 42 if li[m] > li[m+1]: 43 temp = li[m+1] 44 li[m+1] = li[m] 45 li[m] = temp 46
recursion
Write the following sequence with functions:
Fibonacci sequence refers to a sequence of 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368.
1 def func(arg1,arg2): 2 if arg1 == 0: 3 print arg1, arg2 4 arg3 = arg1 + arg2 5 print arg3 6 func(arg2, arg3) 7 8 func(0,1)
Notice
More python source code, video tutorials, welcome to pay attention to the public number: Nancheng Old Dream
> Zero Start Big Data and Quantitative Analysis PDF and Tutorial Source Code
> Using python for data analysis PDF and supporting source code
> Python Financial Application Programming (Data Analysis, Pricing and Quantitative Investment) Lectures and Source Codes for Big Data Projects
> Dong Fuguo's Python Teaching Video
1. Development of Classroom Teaching Management System: Design and Implementation of Online Examination Function
2. Python+pillow image programming;
3. Python+Socket programming
4. Python+tkinter development;
5. Visualization of Python Data Analysis and Scientific Computing
6. Python file operations
7. Python Multithread and Multiprocess Programming
8. Python strings and regular expressions
.....
> Data Analysis Teaching Video
1. Easy control of Statistics - essential skills for data analysis (12 episodes);
2. Easy to use Tableau software - visualize data (9 sets);
3. Competition analysis strategy (6 episodes);
4. Electronic Commerce Data Operation - Three Data Tool Applications (20 sets);
> Big Data (Video and Programs)
1. hadoop
2. Scala
3. spark
> Python Web Crawler Sharing Series PDF
[thousand front] Python crawler from entry to mastery (essence version) (92 episodes)