1.python interpreter type
1.cpython:c language development
2.jpython: java language development
3.Ironpython:.net language development
2. Composition of programs in Python
The program is composed of modules
The module contains statements, functions, classes, etc
Statement contains an expression
The expression establishes and processes the data object and returns the reference relationship of the object
3. Data types in Python
int float complex (1+2j is equivalent to complex number) boolean type
4.python arithmetic operators and expressions
Expressions: consist of math and operators
Function: let the computer do something and return the value
Operator: + - * / / floor division (division and rounding)% remainder exponentiation
5. Numeric type function
1.abs() returns the absolute value of a number
2.round(number,[ndigits])
number: data to be processed
ndigits: keep several decimal places after the decimal point
In python, if the parameter of a function has []. It means that this parameter is optional. It can be passed or not
3.pow(x,y,z=None)
xy%z
6. Variables
1. Concept
A variable is an identifier associated with an object
2. Function
It is used to bind an object and return the reference relationship of the object for reuse in the future
3. Naming rules of variables
Start with a letter or underscore followed by a letter, underscore or number
You cannot use keywords in python as variable names (about 33)
7. Assignment statement
1. Grammar
Variable name = expression
2. Type
The variable itself has no type, and its type is determined by the bound object
3. Description
1. When the variable does not exist, create this variable and establish a relationship with the object
2. When the variable exists, change the binding relationship between the variable and the object
4. Attention
1. A variable can only be bound to one variable
2. An object can bind multiple variables
8. Small integer object pool in Python
In cpython, the numbers in the range of - 5 ~ 256 will always exist in the memory address and will never be released
9. Large integer object pool in pycharm
In order to save memory, pycharm developers default the same object in the same file to the same object
10. Basic output function [print]
1. Function
Output a series of values to the device in the form of a standard
2. Grammar
print(value,sep="",end="/n")
value output content
; sep stands for the default separator between output contents
11. Standard input function [input]
1. Function: obtain a series of characters from the standard input device
2. Syntax: input("Prompt string")
3. Return type: str string
Notes are used #.
12. Type conversion function
1.float(): converts a string or number to a floating-point number
2.int(): converts a string or number to a decimal integer
int([obj],base=10)
Obj: data to be processed, base: represents the base of obj data
For example: int("11", base=2), then return 3,
13. Comparison operation
,>=,<,<=,==,!=
1. Statement statement
1. Concept
Statement is the smallest unit of python program execution
2. Explain
Multiple statements written in one line need to use (;) Separate, but this is not recommended
3. Line break [\]
1. Show wrap
The line break character [\] tells the interpretation executor, and the code of the next line is also the content of this statement
2. Implicit line feed
When the contents in all parentheses wrap, the interpretation actuator will automatically go to the next line to find the corresponding parentheses until they are found
s=100+300+56+2345+10 s1=100+300+56+\ 2345+10 s2=(100+300+56+ 2345+10) print(s) print(s1) print(s2)
2. if conditional judgment statement
1. Function
Let the program selectively execute one or some statements according to conditions
2. Grammar
if Truth expression 1: Statement block 1 elif Truth expression 2: Statement block 2 elif Truth expression 3: Statement block 3 else: Statement block 4
3. Explain
1. The truth value judgment will be carried out from top to bottom. If the value of a truth value expression is True, the statement in it will be executed, and then the execution of the if statement will be ended. If the values of all truth value expressions are False, the statement block in the else statement will be executed
2. elif clauses can have 1, 0 or more
3. else clause can only have 1 or 0
# n=int(input("please enter a number:") # if n%2==0: # Yes, "print!") # else: # print(n, "is odd!") # n=int(input("please enter a number:") # if n>0: # print(n, "is a positive number!") # if n<0: # print(n, "is negative!") # else: # print("n is 0!") month=int(input("Please enter the number of months:")) if 1<=month<=12: # pass #placeholder if month<=3: print("spring!") elif month<=6: print("summer!") elif month<=9: print("autumn!") else: print("Winter!") else: print("The month you entered is incorrect, please re-enter!")
The difference between if and elif: if there are many events in the program and if is used, the whole program will be traversed. If elif is used, as long as if or one of the subsequent elifs meets the conditions, the program will end the program after executing the corresponding input statement (that is, the subsequent elif and else statements will not be executed redundantly) to improve efficiency
practice
Write a program respectively, input the students' scores in three subjects, and judge the highest score and the lowest score
a=int(input("Please enter the grade of the first subject:")) b=int(input("Please enter the grade of the second subject:")) c = int(input("Please enter the grade of the third subject:")) # Method 1 # if c < a > b: # print(a, "is the highest score!") # elif a<b>c: # print(b, "is the highest score!") # else: # print(c, "is the highest score!") # Method 2 m=a # Suppose a is the highest score if b>m: m=b #Because b is higher than the highest score, b is assigned to the variable with the highest score if c>m: m=c print(m,"Is the highest score!")
3. String [str]
1. Function
Used to record text information
2. Representation
The parts enclosed in quotation marks are called strings
'' single quotation mark
"" double quotes
Three single quotation marks
"" three double quotes
3. Quotation mark description
Double quotation marks inside single quotation marks are not terminators
A single quotation mark within a double quotation mark is not a terminator
>>> s='hello world' >>> type(s) <class 'str'> >>> s 'hello world' >>> s="hello world" >>> s 'hello world' >>> s='I'm a student' File "<stdin>", line 1 s='I'm a student' ^ SyntaxError: invalid syntax >>> s="I'm a student" >>> s "I'm a student"
Function of three quotation marks:
Three quotation marks can contain single quotation marks and double quotation marks
Newline characters in a three quote string are automatically converted to \ n
Three quotation marks are generally used to represent the document string of a function or class
>>> s="""hello world ... my name is xx ... """ >>> s 'hello world\nmy name is xx\n' >>> print(s) hello world my name is xx
4. Escape character
Use escape characters to represent special characters
String literals use string \ followed by some characters to represent a special string
s='I\'m a student' #Use \ 'to represent a‘
Common escape characters
Symbol | describe | Symbol | describe |
---|---|---|---|
\' | Represents a single quotation mark | \" | Represents a double quotation mark |
\n | Represents a newline character | \\ | Represents one\ |
\r | Returns the cursor to the beginning of the line | \t | Horizontal tab (Tab) |
5. Original string [raw]
1. Function
Invalid transfer character \
2. Grammar
r "string"
>>> s="C:\newfile\test.py" >>> print(s) C: ewfile est.py >>> s=r"C:\newfile\test.py" >>> print(s) C:\newfile\test.py >>>
6. String operation
1. Symbols
The "+" plus sign operation is used to splice strings
The "+ =" operator is used to splice the original string with the string on the right to generate a new string
"*" is used to generate duplicate strings
"* =" generate a duplicate string and bind the original variable to the generated string
7. String comparison operation
1. Symbols
">, >= , < , <=, ==, !="
2. Comparison rules
Compare according to the Unicode encoding value corresponding to the character
Unicode code is called universal code 65535
3. Function
chr() returns the character corresponding to the Unicode encoding
ord() returns the Unicode encoding corresponding to a character
>>> ord("Xu") 24464 >>> ord("a") 97 >>> ord("A") 65 >>> >>> chr(24464) 'Xu' >>> chr(55555) '\ud903' >>> chr(46783) '뚿' >>> >>> "a">"A" True >>> "ABC">"abc" False >>> "ABC" > "Abc" False >>> "ABC" >"ABCD" False >>>
8. Index of string [index]
1. Function
Sequences can access elements or objects in the sequence through indexes
2. Grammar
String [integer expression]
3. Explain
The forward index starts from 0, the second index is 1, and so on. The index of the last element is the length of the string - 1
The reverse index starts from - 1. The index of the last element is - 1, the penultimate is - 1, and so on. The index of the first element is the opposite of the length of the string
A B C D E F G
0 1 2 3 4 5 6
-7 -6 -5 -4 -3 -2 -1
practice:
Enter any string and print out the first character and the last character of the string
If the length of the string is even, print out an @ symbol. If the length of the string is odd, print out the middle character
len() returns the length of a sequence
s=input("Please enter a string:") print(s[0]) print(s[-1]) if len(s)%2==0: print("@") else: mid_index=len(s)//2 # find the index corresponding to the middle character print(s[mid_index])
9. slice of string
1. Function
Extract consecutive or spaced elements from a string
2. Grammar
String [start index: end index: step size]
3. Parameters
1. Start index: the position where the slice is cut. 0 represents the first element and 1 represents the second element
2. End index: the end point of the slice, but excluding the end point
3. Step size: the direction and offset of the slice after obtaining the current element each time. No step size is equivalent to moving the position of an index to the right after the value is taken (1 by default)
When the step size is positive, take the positive slice
When the step size is negative, the reverse slice is taken
When the slice of a string contains step size, it is equal to cutting the current element. After cutting the current element, use the index of the current element plus step size to get a new index and get the element corresponding to the new index
When the step size of a string slice is negative, the element corresponding to the starting index must be on the right of the element corresponding to the ending index before reverse slicing can be carried out
A B C D E F G
0 1 2 3 4 5 6
-7 -6 -5 -4 -3 -2 -1
>>> s[0:3] 'ABC' >>> s[1:5] 'BCDE' >>> s[1:-1] 'BCDEF' >>> s[0:-1:2] 'ACE' >>> s[-1:0:-2] 'GEC' >>> >>> s[0:-1:-2]
practice:
Enter a string to judge whether the string is a palindrome
ABCBA
Shanghai's tap water comes from the sea
4. Format string
1. Function
Generate a formatted string
2. Grammar
Format string% parameter value
Format string% (parameter value 1, parameter value 2)
fmt="full name:%s,Age:%d" name="Xiao Ming" age=20 print(fmt%(name,age))
Symbol | describe |
---|---|
%s | String placeholder |
%d | Decimal integer placeholder |
%f | Decimal floating point placeholder |
3. Modifier
"-" left justified (right justified by default)
"+" displays a positive sign
0 fill in the blank space on the left
Width: the width of the entire data output
Precision: how many decimal places are reserved
>>> a 123 >>> "%d"%123 '123' >>> "%10d"%123 ' 123' >>> "%-10d"%123 '123 ' >>> "%-+10d"%123 '+123 ' >>> "%0+10d"%123 '+000000123' >>> "%f"%123.456 '123.456000' >>> "%.2f"%123.456 '123.46' >>> "%2f"%123.456 '123.456000' >>> "%f"%123.4567891 '123.456789' >>> "%.7f"%123.4567891 '123.4567891' >>>
5. while loop
1. Function
Let the program execute one or more statements repeatedly according to the conditions
2. Grammar
initial condition while Truth expression: Statement block 1 Condition variation else: Statement block 2
3. Process
1. Define an initial condition first
2. First judge the True value expression and test whether the Boolean value is True or False
3. If the truth expression is True, execute statement block 1, and then return to step 2 to judge the truth expression
4. If the truth expression is False, execute statement block 2 and end the execution of the while statement
i=1 #A variable used to record the number of cycles while i<=10: print("hello world") i+=1
4. Attention
1. You want to control the value of the loop's truth expression to prevent dead loops
2. Loop conditions are usually controlled by loop variables in truth expressions
3. Usually, the loop variables need to be changed inside the loop statement block to control the number of loops and the direction of variables
6. Nesting of while loops
The while statement itself is a compound statement, which can be nested into another statement
1. Grammar
while Truth expression 1: Statement block 1 while Truth expression 2: Statement block 2 else: Statement block 3 else: Statement block 4
All integers in the range of 1 ~ 20 are displayed in one line, and each integer is separated by a space
Print 10 lines of this data
j=1 while j<=10: i=1 while i<=20: print(i,end=" ") i+=1 else: print() #Print a line break j+=1
7. break statement
1. Function
Used in a while for loop statement to terminate the execution of the current loop statement
2. Explain
1. When the break statement is executed, all statements after the break statement will not be executed
2. break statements are usually used in combination with if statements
3. When a break statement terminates a loop, the else clause of the loop statement will not execute
4. The break statement can only terminate the execution of the current loop. If a loop is nested, it will not jump out of the nested outer loop
5. The break statement can only be used inside a loop statement
i=1 while i<10: print("At the beginning of the cycle i=",i) if i==5: break print("At the end of the cycle i=",i) i+=1 else: print("else Statement executed!") print("When the program exits i=",i)
practice:
Enter a positive integer and print whether the number is a prime number
n=int(input("Please enter a positive integer:")) if n<=1: print("Not prime!") elif n==2: print(n,"It's prime!") else: i=2 while i<n: if n%i==0: print("Not prime") break i+=1 else: print("It's prime!")
8. for loop
1. Function
Used to traverse data elements in iteratable objects
2. Grammar
for variable in Iteratable object: Statement block 1 else: Statement block 2
3. Iteratable object
It refers to objects that can obtain data in turn, including non empty strings, non empty lists, non empty dictionaries, tuples, etc
4. Explain
1. The variable is successively assigned with the elements given by the iteratable object each time, and then the statement block 1 is executed
2. After the iteratable object cannot provide data elements, execute the statement part in the else clause, and then exit the loop
3. The else clause can be omitted
s="ABCDEF" for i in s: print(i,end=" ") print() else: print("Loop terminated due to end of iteration!")
practice:
Write a program, input a string, and print out how many spaces there are in the string
s=input("Please enter a string:") count=0 #Variable used to count the number of spaces # i=0 # lenth=len(s) # while i<lenth: # if s[i]==" ": # count+=1 # i+=1 # print("the number of spaces entered is% d"% count ") for i in s: if i==" ": count+=1 print("The number of spaces entered is%d individual"%count)
9. Nesting of for loops
for x in "ABC": for y in "123": print(x+y)
10. range function
1. Function
Used to create an iteratable object that generates a series of integers (also known as an integer sequence generator)
range(stop) starts from 0, generates an integer each time, and then adds 1 until stop
range(start,stop,step) starts from start, generates an integer each time, and then moves step until stop
for i in range(10): print(i,end=" ") print() for i in range(1,20,2): print(i,end=" ")
11. continue statement
1. Function
It is used in the loop statement. The statement after continue in this loop is no longer executed, and a new loop is restarted
2. Explain
1. Executing the continue statement in the while loop will directly jump to the truth expression of the while statement to re judge the loop conditions
2. Executing the continue statement in the for loop will remove an element from the iteratable object, bind the variable and cycle again
for i in range(5): if i==2: continue print(i)
12. Random random module
1,random.random()
Used to generate a random floating-point number in the range of 0 to 1
2,random.randint(a,b)
Used to generate an integer within a specified range, where a is the lower limit and b is the upper limit
3,random.randrange(start,stop,step)
Gets a random number from the specified set according to the increasing cardinality
import random a=random.random() print(a) b=random.randint(1,100) print(b) c=random.randrange(1,100,2) print(c)
13. List [list]
1. Function
A container used to store any type of data
2. Concept
1. A list is a container that can store any type of data
2. The list is a variable sequence
3. There is no correlation between elements, and there is a sequential relationship between them
3. Representation
[]
4. List operation
"+" is used to splice the list to generate a new list, and the memory address will change
"+ =" is used to splice the original list with the list on the right, and rebind the new list with this variable. The memory address will not change
"*" generates a duplicate list, generates a new list, and the memory address will change
"* =" is used to generate a duplicate list and rebind the new list with this variable. The memory address will not change
>>> L=[100,200,300,400] >>> id(L) 1883498866184 >>> L=L+[500,600] >>> L [100, 200, 300, 400, 500, 600] >>> id(L) 1883498866824 >>> L=[100,200,300,400] >>> id(L) 1883498866184 >>> L+=[500,500] >>> L [100, 200, 300, 400, 500, 500] >>> id(L) 1883498866184 >>> >>> s="abc" >>> id(s) 1883497905432 >>> s=s+"de" >>> s 'abcde' >>> id(s) 1883498848184 >>> s="abc" >>> id(s) 1883497905432 >>> s+="de" >>> s 'abcde' >>> id(s) 1883498848128 >>>
5. List comparison operation
1. Symbols
">, >=, <, <=, ==, !="
2. Rules
Compare the sizes according to the Unicode encoded values of the characters (data types must be the same) in the corresponding positions of the list
3. in not in operator
Determine whether an object exists in the sequence
6. Index and slice of list
The index of the list is exactly the same as the slicing rule in the same string
7. Index assignment of list
A list is a variable sequence, and the elements in the list can be changed by index assignment
>>> L=[100,200,300] >>> L[0]="hello" >>> L ['hello', 200, 300] >>>
8. Slice assignment of list
1. Function
The sorting of the original list can be changed, and data can be inserted and modified
Slice can be used to change the value of the corresponding element of the list
2. Grammar
List [slice] = iteratable objects
>>> L=[100,200,300,400,500,600] >>> L[0:1] [100] >>> L[0:1]=["A"] >>> L ['A', 200, 300, 400, 500, 600] >>> L[0:1]=["A","B"] >>> L ['A', 'B', 200, 300, 400, 500, 600] >>> >>> L[0:3] ['A', 'B', 200] >>> L[0:5:2] ['A', 200, 400] >>> L[0:5:2]=["hello","world","name"] >>> L ['hello', 'B', 'world', 300, 'name', 500, 600] >>> L[0:5:2]=["hello","world","name",100] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: attempt to assign sequence of size 4 to extended slice of size 3 >>> L[0:5:2]=["hello","world","name",] >>> L ['hello', 'B', 'world', 300, 'name', 500, 600] >>> >>> L[0:0] [] >>> L[0:0]=[1000] >>> L [1000, 'hello', 'B', 'world', 300, 'name', 500, 600]
append() method
Append an element to the end of the list
L.append()
14. DICTIONARY [dict]
1. Concept
1. A dictionary is a variable container that can store any type of data
2. Each data in the dictionary is mapped and stored with key value pairs
3. Every data in the dictionary is indexed by keys, not by subscripts
4. The keys in the dictionary cannot be repeated, and only immutable types can be used as the keys of the dictionary
5. There is no sequential relationship between the data of the dictionary, and the storage of the dictionary is disordered
2. Function
It can improve the retrieval speed of data
3. Representation
Enclose with {} and separate each key value pair with a comma. Between keys and values:
>>> d={} >>> d {} >>> type(d) <class 'dict'> >>> d={"name":"Xiao Ming","age":20} >>> d {'name': 'Xiao Ming', 'age': 20} >>>
4. Dictionary index
1. Grammar
dict[key]
>>> d["name"] 'Xiao Ming' >>> d["age"] 20 >>>
5. Basic operation of dictionary
1. Add key value pair
dict[key]=value
2. Modify key value pair
dict[key]=value
3. Delete key value pair
del dict[key]
6. in not operator
Determine whether a key exists in the dictionary
15. tuple
1. Concept
Tuples are immutable sequences
Tuples are containers that can store any data type, and there is a sequential relationship between them
2. Representation
Enclose with (), and enclose a single element with a comma to distinguish whether it is a single object or a tuple
>>> t=(100,200) >>> t (100, 200) >>> type(t) <class 'tuple'> >>> t=(100) >>> t 100 >>> type(t) <class 'int'> >>> t=(100,) >>> t (100,) >>> type(t) <class 'tuple'>
3. Operation of tuples
"+, +=, *, *=, <, <=, >, >=, ==, !=, in not in"
The operation rule of the rule is the same as that of the list (the operation can be performed only when the data type is the same)
4. Index and slice of tuples
The index and slicing rules of tuples are the same as those of strings, and do not have the function of slicing and index assignment
16. String text parsing function
1,split()
Split the string, slice the string through the specified separator, and return the split string list
S.split()
>>> s="www.baidu.com" >>> s.split(".") ['www', 'baidu', 'com'] >>>
2,join()
S.join(iterable)
Use the string in the iteratable object to return a string separated by S
>>> L="hello" >>> "-".join(L) 'h-e-l-l-o' >>> " ".join(L) 'h e l l o' >>>
17. Function
1. Concept
A function is a block of statements that can be repeated
Functions can be regarded as a collection of program statements and given a name instead
2. Function
It can be reused to improve the reusability of code
3. Grammar
def Function name(parameter list[Formal parameter]): Statement block
4. Explain
1. The function name must be an identifier, which conforms to the naming rules of variables
2. The function has its own namespace. To let the function process external data, pass in some data to the secondary function through the parameter list. If you do not need to pass in parameters, the parameter list can be empty, but the statement part cannot be empty and needs to be filled with pass statement
3. The function name is a variable and cannot be assigned easily
def say_demo(): print("hello world") print("wlecome to Hangzhou") print("xxx") say_demo() #Function call
5. Function call
1. The call to a function is an expression
2. If there is no return statement, this function returns the None object after execution
3. If the function needs to return other objects, use the return statement
6. return statement
1. Grammar
return [expression]
2. Function
Used in a function to end the execution of the current function, return to the place where the function is called, and return a reference relationship to the object
3. Explain
1. The expression after the return statement can be omitted, which is equivalent to return None
2. If there is no return statement in the function, the function returns None after executing the last statement, which is equivalent to adding a return None statement at the end
1. Transfer method of function arguments
1. Position transmission reference
The correspondence between arguments and formal parameters is in sequence according to the position
def func(a,b,c):
print("a",a)
print("b",b)
print("c",c)
func(100,200,300) 2.Sequence transmission parameter Sequence parameter passing refers to the use of*After the sequence is disassembled, the parameters are transferred in the way of position parameter transfer def func(a,b,c): print("a",a) print("b",b) print("c",c) L[100,200,300] func(*L) 3.Keyword transfer parameter When parameters are passed, formal parameters and arguments are matched by name def func(a,b,c): print("a",a) print("b",b) print("c",c) func(a=100,b=200,c=300) 4.Dictionary keyword transfer parameter When the argument is a dictionary, use**After disassembling the dictionary, transfer the parameters according to the keyword def func(a,b,c): print("a",a) print("b",b) print("c",c) d=("a":100,"b":200,"c":300) func(**d) 2,Default parameters for function 1.grammar def Function name(Parameter name 1=The default value is,...): pass Default parameters must exist from right to left. If one parameter is a default parameter, all parameters on the right must be default parameters 3,How to define the formal parameters of a function 1.Positional parameter Accept arguments according to their position 2.Asterisk tuple parameter def Function name(*args): print("The number of arguments is",len(args)) print("args",args) 3.Named keyword parameter 1.grammar def func(*,Named keyword parameter) pass def func(*args,Named keyword parameter) pass 2.effect All named keyword parameters must be passed with keyword parameters or dictionary keywords def func(**kwargs): print("The number of keyword parameters is:",len(kwar gs)) print("kwar gs=",kwar gs) func(a=100,b=200,c="300") Order of function parameters from left to right: Position parameter asterisk tuple parameter naming keyword parameter double asterisk keyword parameter 4.Double star dictionary parameter 1.grammar def func(**kwargs) pass 2.effect Collect redundant keyword parameters def func(**kwargs): print("The number of keyword parameters is:",len(kwar gs)) print("kwar gs=",kwar gs) func(a=100,b=200,c="300") Order of function parameters from left to right: Position parameter asterisk tuple parameter naming keyword parameter double asterisk keyword parameter 4,Function variable problem 1.local variable The variables defined inside the function are called local variables (the formal parameters of the function are also local variables) Local variables can only be used inside functions Local variables are created during a function call and are automatically destroyed after the function call 2.global variable Defined outside the function, inside the module (current).py)The variables of are called global variables Global variable, which can be accessed directly by all functions (but it cannot be assigned directly inside the function) Description of local variables 1.Create a local variable when assigning a value to the variable for the first time in the function, and modify the binding relationship of the local variable when assigning a value to the variable again 2.Assignment statements inside functions do not affect global variables 3.Local variables can only be accessed inside the declared function, while global variables can be accessed within the scope of the whole module (current file) be careful: cpython The interpreter executor will default the variable on the left side of the operator to a local variable 5,object-oriented programming 1.object An object or instance in real life 2.object-oriented programming Regard everything as an object, and establish the relationship between objects with behavior Object can have attribute [noun] The object can have behavior [verb] 3.Class[ class] Objects with the same properties and behaviors are grouped into one group, that is, one type Class is a tool used to describe objects. Classes can create objects (instances) of this kind Class defines the properties and methods common to each object in the collection 4.grammar class Class name (inheritance list): """Class""" Example method Class variable Class method Static method 5.effect 1.You can create one or more objects (instances) 2.Variables and methods defined within a class can be owned by instances created by this class 6.explain 1.The class name must be an identifier (the same naming rules as variables). It is recommended to capitalize the first letter 2.A class name is essentially a variable that binds to a class instance class Car(): """This is a car""" print("Auto factory creation completed!") def __init__(self,color) self.colro=color def run(self,speed): //self represents the instance generated by the class """Method for adding driving behavior to an instance""" self.speed = speed //Add speed attribute to the instance print("The car is running at",self.speed,"Drive at your speed!")
def Loge(self,loge ):
self.loge=loge
print("brand of car", self. Logo)
car=Car()
car.run(100)
car. Loge (Mercedes Benz)
7. Class instantiation (class call)
1. Grammar
Variable = class name ([create parameter list])
2. Function
Create an instance object of this class and return the reference relationship of this instance object
3. Description
The instance uses its own scope and namespace to create instance variables (properties) for the instance. The instance can call methods in the class and access class variables in the class
Add attribute: car Color = black
8.Class 1.grammar class Class name(): def Instance method name( self,Parameter 1, parameter 2.. . ): pass 2.effect Used to describe the behavior of an object, so that all objects of this class can have this behavior 3.explain 1.The essence of an instance method is a function, which is defined in a class 2.The first parameter of the instance method represents the instance that calls this method, which is generally used self express 3.Properties of the instance method property class 9.Call of instance method example.Instance method name (parameter list) Class name.Instance method name (instance, call parameter) 10.Class constructor (initialization function) 1.grammar class Class name (): def __init__(self,parameter list): pass 2.effect 1.init Method is a special method, called the constructor or initialization method of a class. It will be called automatically when an instance of this class is created 2.self An instance representing a class must exist when it is defined, although it is not necessary to pass in the corresponding parameters 3.If you add formal parameters like a self created class, you need to use a constructor 4.Function: add necessary resources such as attributes to the newly created object 11.inherit 1.concept 1.Inheritance: inheritance is the function of continuing the old class 2.Derivation: add new functions based on the old class 2.effect 1.With inheritance and derivation mechanism, some common functions can be added to the base class 2.Change the function of the original class without changing the parent class code to realize code sharing 12.python File operations in File is the basic unit for data storage, which is usually used for long-term data storage 1.Operation steps of file 1.Open file 2.Read write file 3.Close file 2.Operation format 1,grammar open(filename,mod,[encoding]) filename The path or name of the file mod Operation mode of file encoding File encoding format 2.explain Open a file and return the file stream object. If the opening fails, it will be triggered ioError error 3.How to close a file f.close() 1-1.mod Parameter setting r:Open as read-only w:.........Overwrite write//Writing is the function of creating files by yourself a:Open in write only mode. If there is content in the source file, write will be appended b:Open file in binary mode wb:Open file in binary write mode rb:Open file in binary read mode t:Open file in text mode 4.Reading and writing of text (write line breaks by yourself) 1.Writing of text 1.f.write(character string): Write string to an open file 2.fwritelines(String list): Write multiple strings to an open file 2.Text reading 1.f.readline() Read a line of text 2.f.readlines() Read multiline text 3.f.read(n) read N Characters 5.with sentence 1.grammar with expression as variable pass 7,python Reptile 1.web review url:Uniform resource locator http:80 https: 443 get Get displayed in url in post Secure access 8,Crawler request module 1.classification 1.python2 Medium: urllib2 urllib3 2.python3 Medium: urllib.request requests 2.common method 1.urllib.request.urlopen(url) Function: Send a request to the website and get a response Format: byte stream=res.read()//obtain character string res.read().decode("utf-8")//decode 2.req=urllib.request.Request(url,headers) Function: create a request object to send a request to the website and get a response Use process: 1.req=urllib.request.Request(url,headers) Create request object 2.res=urllib.request.urlopen(req) Make a request to the website 3.html=res.read().decode("utf-8") decode 9,url Coding module https://www.baidu.com/s?wd=%E8%94%A1%E5%BE%90%E5%9D%A4 1.Module name urllib.parse: url Coding module 2.coding method urlencode 1.grammar urllib.parse.urlencode({Dictionaries}) //import urllib.parse //key={"wd": "source"} //data=urllib.parse.urlencode(key) # encodes data //print(data) 3.quote Method - Coding //import urllib.parse //data=urllib.parse.quote("source") //print(data) 4.unquote Method - decoding //import urllib.parse //data=urllib.parse.unquote("%E6%BA%90") //print(data) 1
Item 1
Write a small crawler. When the program runs, input any keyword in the terminal and obtain the source code of the queried web page through Baidu query function.
1. Get the query information first
2. url code the query information
3. Splice real URLs
4. Access through the real url to get the corresponding object
5. Transcode the corresponding object. And save to local
/*import urllib.request
import urllib.parse
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"} base_url="https://www.baidu.com/s? "# web page's default url key=input("Please enter the information you want to query:") key=urllib.parse.urlencode({"wd":key}) #url encode the data url=base_url+key #Splice to get the real url req=urllib.request.Request(url,headers=headers) #Create request object res=urllib.request.urlopen(req) #Make a request to the website html=res.read().decode("utf-8") with open("Baidu.html","w",encoding="utf-8") as f: f.write(html) print("File written successfully!")*/
Item 2
Write a crawler program. When the program is running, enter the specified post bar name, you can enter any post bar and crawl the source code of the number of pages in the specified range of the post bar (using object-oriented programming)
import urllib.request import urllib.parse class URL(): def __init__(self): self.base_url= "https://tieba.baidu.com/f?" self.headers= {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"} def get_html(self,url,filename): #Get the function of web page source code req=urllib.request.Request(url,headers=self.headers) res=urllib.request.urlopen(req) html=res.read().decode("utf-8") self.save_html(html,filename) def get_page(self): name=input("Please enter the post bar name of the query:") begin=int(input("Please enter the initial page number for crawling:")) end=int(input("Please enter the page number of crawling end:")) for i in range(begin,end): key= urllib.parse.urlencode({"kw":name})#Code the entered post bar name pn=(i-1)*50 #Get the parameters to control the number of pages url=self.base_url+key+"&pn"+str(pn) #The real url obtained by splicing filename="The first"+str(i)+"page.html"#Create a name for the file self.get_html(url,filename) #Call the upper function to get the source code def save_html(self,html,filename): with open(filename,"w",encoding="utf-8") as f: f.write(html) print("Data saved successfully") tieba=URL() tieba.get_page()
1. Analysis module
1. Classification of data
1. Structured data
Features: fixed format html json xml
2. Unstructured data
Pictures, videos and audio are generally stored as binary
2. Regular expression module re
1. Metacharacters in regular expressions
Metacharacter | effect |
---|---|
. | Match any character (excluding newline \ n) |
\d | Match any number |
\s | Match any white space character |
\S | Match any non white space character |
\n | Match a newline character |
\w | Match letters, numbers, and underscores |
\W | Matches are not letters, numbers, underscores |
* | Match 0 or more expressions |
+ | Match occurs once or n times |
? | Matches 1 or 0 fragments defined by previous regular expressions (non greedy) |
{m} | Exactly match m previous expressions |
^ | Match the beginning of a line of string |
$ | Matches the end of a line of string |
() | Matching the expression in parentheses also represents a group |
2,match()
Try to match the regular expression from the beginning of the string. If successful, return the matching result; otherwise, return None
import re content="Hello 123 4567 cxk_ji ni tai mei" result=re.match("^Hello\s\d\d\d\s\d{4}\s\w{3}",content) print(result.group()) print(result.span())
3. Match target
If you want to get part of the specified content from the string, you can use parentheses to enclose the data you want to extract. Parentheses actually mark the start and end indexes of a sub expression. Each marked sub expression will correspond to a group in turn. Call the group method to pass in the index of the group to extract the result
import re content="Hello 123 4567 cxk_ji ni tai mei" result=re.match("^Hello\s(\d\d\d\s\d{4})\s(\w{3})",content) print(result.group(1)) print(result.group(2)) print(result.span())
4. Universal matching
There is a universal matching that can be used, that is. *, Including (.) You can match any string (except the newline character), and the asterisk represents matching the previous character infinite times, so they are combined to match any character
import re content="Hello 123 4567 cxk_ji ni tai mei" result=re.match("^Hello.*$",content) print(result.group()) print(result.span())
5. Greedy matching and non greedy matching
1. Greedy matching (. *):
On the premise of successful matching of the whole expression, match as many as possible
import re content="Hello 1234567 cxk_ji ni tai mei" result=re.match("^Hello.*(\d+).*$",content) #The result of greedy matching is only 7 print(result.group(1)) print(result.span())
2. Non greedy matching (. *?):
On the premise of successful matching of the whole expression, try to match as few as possible
import re content="Hello 1234567 cxk_ji ni tai mei" result=re.match("^Hello.*?(\d+).*$",content) #The result of non greedy matching is only 1234567 print(result.group(1)) print(result.span())
6. Modifier
Regular expressions can contain some optional flag modifiers to control the matching pattern, and the modifier is specified as an optional flag
import re content="""Hello 1234567 cxk_ji ni tai mei""" result=re.match("^Hello.*$",content,re.S) #The result of greedy matching is only 7 re S means that line breaks can be matched print(result.group()) print(result.span())
Common modifiers
Modifier | describe |
---|---|
re.I | Make matching pairs case insensitive |
re.L | Do localization recognition matching |
re.M | Multiline matching |
re.S | Line breaks can be matched |
re.U | Parsing characters from Unicode character sets |
re.X | This flag gives you a more flexible format to make your regular expression easier to understand |
7,search()
Once the beginning of the match method does not match, the whole match will fail
1. Function
The entire string is scanned at match time and the first successful match is returned
2. Grammar
re.search(pattern,string,flags=0)
3. Parameter description
1. pattern: matching regular expression
2. String: the string that needs to be matched
3. flags: flag bit
import re content="""Hello 1234567 cxk_ji ni tai mei""" result=re.search("cxk.*$",content,re.S) print(result.group()) print(result.span())
8,findall()
If you want to get all the text content that matches the regular expression, you need to use findall(), which will search the whole string and then return all the content that matches the regular expression
sub()
You can use regular expressions to modify text
string="12dskjdskj34kdslkds56lkds78" content=re.sub("\d+","",string) #Remove all numbers, and the second parameter is the replaced string print(content)
9,compile()
Regular expression strings can be mutated into regular expression objects for reuse in subsequent matches
string1="2019-7-7 10:48" string2="2019-7-8 11:48" string3="2019-7-9 12:48" pattern=re.compile("\d{2}:\d{2}") #Compile regular expressions into an object result1=re.sub(pattern,"",string1) result2=re.sub(pattern,"",string2) result3=re.sub(pattern,"",string3) print(result1) print(result2) print(result3)
3. csv module
1. Import module
import csv
2. Open csv file
with open("xx.csv","w",newline="",encoding="utf-8") as f:
newline = "" must be added, otherwise there will be more blank lines
3. Initialize write object
witer=csv.writer(f)
4. Write data
writer.writerow([list])
import csv with open("demo.csv","w",newline="",encoding="utf-8") as f: writer=csv.writer(f) #Initialize write object L=["Xiao Ming",20,"male"] L1=["Xiao Hong",30,"female"] writer.writerow(L) #Write data writer.writerow(L1)
4. Cat's eye movie project
Regular expression for extracting movie information:
'<div class="movie-item-info">.*?title="(.*?)".*?class="star">(.*?)</p>.*?class="releasetime">(.*?)</p>',re.S
Write a crawler program, which can crawl the name, starring role and release time of all ranking films, and store them in excel and local
1. Splice url information of each page
2. Write a function to get the source code of the web page
3. Write functions that parse useful data from web page source code
4. Write a function that writes data to a local csv file
5. Module installation
python -m pip install requests
python -m pip install bs4
python -m pip install lxml
6. requests module
1. Common methods
1,res=requests.get(url,headers=)
Send a request to the website and get the response object
2. Response corresponding attribute (res)
1. res.text changes from byte to string
2,res.encoding="utf-8"
3. res.content binary byte stream (required when downloading pictures, audio and video)
4,res.status_code returns the HTTP response code
5. res.url returns the URL of the actual data
import requests url="https://www.baidu.com/" headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"} res=requests.get(url,headers=headers) res.encoding="utf-8" print(res.text) print(res.url) #Return the url address of the actual data print(res.status_code) #Return HTTP response code print(res.content) #Get content bytes
6. url encoding parameters (params)
params: Dictionary
res=requests.get(url,params={},headers=)
Automatically encode the url of params, and then splice it with the url
{"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh;q=0.9", "Cache-Control": "max-age=0", "Connection": "keep-alive", "Cookie": "anonymid=jxsj5fy78nj65k; depovince=ZJ; _r01_=1; jebe_key=dacc7a51-8acd-46aa-ac34-9b0dcd0bf180%7C712e620fada2a7904696ec0f971e40cd%7C1562478178258%7C1%7C1562478176757; JSESSIONID=abcU3TNteZ-K-7OYbFlVw; ick_login=0feb451a-fdc7-4f21-9739-70aa6b149596; loginfrom=null; wp_fold=0; jebe_key=dacc7a51-8acd-46aa-ac34-9b0dcd0bf180%7C20431d0d28353673afdf82da213cc1fa%7C1562487391831%7C1%7C1562487390408; t=60525e04b0d5fb0611a37d12e64779fe7; societyguester=60525e04b0d5fb0611a37d12e64779fe7; id=964833547; xnsid=347dad61; jebecookies=5da84bdc-7000-4f30-8e5c-10fcff9ae57a|||||", "Host": "www.renren.com", "Upgrade-Insecure-Requests":"1" , "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}
7. Beautiful soup parser
1. Node selector
html = """ <html> <head> <title>The Dormouse's story</title> </head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ from bs4 import BeautifulSoup soup=BeautifulSoup(html,"lxml") print(soup.title) print(soup.head) print(soup.p.string) print(type(soup.p))
2. Get properties
print(soup.p.attrs["name"]) print(soup.p.attrs) print(soup.p["name"])
3. Nested selection
print(soup.head.title.string)
4. Method selector
1,find_all()
Query all qualified elements and pass them some attributes or text to get all qualified elements
html=""" <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-body"> <ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> """ from bs4 import BeautifulSoup soup=BeautifulSoup(html,"lxml") # print(soup.find_all(name="ul")) #Find by node name # print(type(soup.find_all(name="ul")[1])) #The return type is BS4 element. Tags are iteratable objects for i in soup.find_all(name="ul"): for li in i.find_all(name="li"): print(li.string)
5,attrs
In addition to querying according to the node name, we can also access some attributes to query
print(soup.find_all(attrs={"id":"list-1"})) print(soup.find_all(id="list-1")) print(soup.find_all(class_="element")) print(soup.find_all("ul",class_="list list-small"))
6,text
Text can match the text of the node. The passed in form can be a string or a regular expression
print(soup.find_all(text=re.compile("Hello"))) print(soup.find_all(text="Hello"))
7,find()
Returns the first element that matches
print(soup.find(name="ul"))
8. css selector
When using css selector, you need to call the select() method and pass in the corresponding css selector
The id name in the css selector is # represented before it
css select the class name before express
print(soup.select(".panel .panel-heading")) print(soup.select("ul li")) #Find the node information of li under ul print(soup.select("#list-2 .element"))#Find the node information of class=element under id list-2 print(soup.select("div > ul > li"))#Find the node information of li under the ul tag under div
Nested selection
for ul in soup.select("ul"): print(ul["id"]) print(ul.attrs["id"])
Get text
for li in soup.select("li"): print("get_text:",li.get_text()) print("string:",li.string) print("text:",li.text)
Modules to be installed
#### python -m pip install jieba #### python -m pip install wordcloud #### python -m pip install matplotlib #### python -m pip install imageio
I want to wait for the car I've been waiting for
Item 3. Crawl the short comments of any film on douban.com and store them in the txt text of this
import requests from bs4 import BeautifulSoup class Douban(): def __init__(self): self.baseurl="https://movie.douban.com/subject/30171425/comments?start=" self.headers ={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"} self.star=0 def get_html(self,url): html=requests.get(url,headers=self.headers).text self.get_comment(html) def get_comment(self,html): comment_list=[]#Save the processed comment list soup=BeautifulSoup(html,"lxml") comment=soup.select(".comment p") # comment=soup.find_all("span",class_="short") for i in comment: comment_list.append(i.text) self.save_comment(comment_list) def save_comment(self,comment_list): with open("comment.txt","w",encoding="utf-8") as f: f.writelines(comment_list) print("Information storage completed!") def get_page(self): begin=int(input("Enter crawl start page")) end=int(input("Enter crawl end page")) for page in range(begin,end+1): self.star=(page-1)*10 url=self.baseurl+str(self.star) self.get_html(url)#Call the upper function to get the web page source code douban=Douban() douban.get_page()
Item 4. Climb www.gushiwen.com Org, and save it locally
import requests from bs4 import BeautifulSoup import os class Gushi(): def __init__(self): self.url = "https://so.gushiwen.org/shiwen/default_0AA2.aspx" self.headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"} def get_html(self): html=requests.get(self.url,headers=self.headers).text self.get_info(html) def get_info(self,html): soup=BeautifulSoup(html,"lxml") title=soup.select("div > p > a > b") content=soup.find_all("div",class_="contson") self.save_info(title,content)#Call save function def save_info(self,title,content): files=os.getcwd()+"\Ancient poetry"#Create a folder path if not os.path.exists(files): os.mkdir(files)#Create a folder L=[] for i in range(len(title)): L.append(content[i].text)#Store the processed content into the list with open(files+"\%s.txt"%title[i].text,"w",encoding="utf-8")as f: f.write(title[i].text)#Write title for j in L[i]: if j!=" ": f.write(j)#Write content if j==". ": f.write("\n") gushi=Gushi() gushi.get_html()
DAY 3
1. Jieba participle
1. Word segmentation mode
1. Accurate word segmentation, trying to separate sentences most accurately, which is suitable for text analysis
2. Full mode: sweep out all words that can be formed into words in the sentence in seconds, which is very fast, but it can not solve the problem of ambiguity
3. Search engine mode: segment long words on the basis of accurate mode to improve the recall rate, which is suitable for search engines
2. Word segmentation method
For a long paragraph of text, the principle of word segmentation is roughly divided into three parts:
1. First, use regular expressions to roughly divide Chinese paragraphs into sentences
2. Construct each sentence into a directed acyclic graph, and then find the best cutting scheme
3. Finally, for continuous words, HMM model is used to divide them again
2,codecs
This method can specify an encoding to open the file. The file opened with this method returns Unicode encoding when reading and Unicode encoding when writing
Use open() to specify Unicode encoding for writing. If str is used, it will be decoded into Unicode according to the character encoding declared in the source code file before operation. Compared with open(), this is not prone to problems
3. XPath parsing module
1. Concept
The whole process is XML path language, that is, XML path language, which is a language to find information in XML documents
2. Use process
from lxml import etree parseHTML=etree.HTML(html) result=parseHTML.xpath("xpath expression")
3. Common XPath rules
nodename | Select all nodes for this node |
---|---|
/ | Select direct child node from current node |
// | Select a descendant node from the current node |
. | Select current node |
... | Select the parent node of the current node |
@ | Select Properties |
4. All nodes
from lxml import etree html=etree.HTML(text)#Convert data to xpath objects print(html.xpath('//*') # get all node information
1. The return type is a list. Each Element is of Element type, followed by many node names
2. You can also specify the node name to obtain data
from lxml import etree html=etree.HTML(text)#Convert data to xpath objects print(html.xpath('//li ')) # get the node information of the specified name
5. Child nodes
You can find word nodes or descendant nodes through / or / /
print(html.xpath('//li/a ')) # find the child node under li
print(html.xpath('//ul//a ')) # find the descendant node a under ul
6. Parent node
You can query the parent node information of the current node through (...)
print(html.xpath('//a[@href="http://domestic.firefox.sina.com/"]/../@class')) # query the class attribute of the parent node of the current node print(html.xpath('//a[@href="http://domestic.firefox.sina.com/"]/../../@id ')) # query the id attribute of the grandparent node of the current node
7. Attribute matching
print(html.xpath('//li[@class="link1"]')) print(html.xpath('//a[@id="channel"]'))
8. Text acquisition
The xpath () method in the text node can be used to get
print(html.xpath('//li[@class="link2"]/a/text() ') # extract the text information of node a under node li print(html.xpath('//li[@class="link2"]/a/@href ')) # extract the href information of node a under node li
9. Attribute multi value matching (contains)
Sometimes some nodes have more than one attribute, so we need to use multi value matching
text=""" <li class="li list-1"><a href="link.html">hello world</a></li> <li class="li list-1"><a href="link.html">hello</a></li> """ from lxml import etree html=etree.HTML(text)#Convert data to xpath objects print(html.xpath('//li[@class="li list-1"]/a/text()')) print(html.xpath('//li[contains(@class,"li")]/a/text()'))
Through the contains method, the first parameter passes in the attribute name and the second parameter passes in the attribute value. As long as the attribute contains the passed in attribute value, the matching can be completed
10. Select in order
Sometimes, when selecting, some attributes match multiple nodes at the same time, but if you only want one of them, you need to
text=""" <div> <ul> <li class="item1"><a href="link1.html">cxk song</a></li> <li class="item2"><a href="link2.html">cxk dance</a></li> <li class="item1"><a href="link3.html">cxk rap</a></li> <li class="item2"><a href="link4.html">cxk basketball</a></li> </ul> </div> """ from lxml import etree html=etree.HTML(text)#Convert data to xpath objects # print(html.xpath('//li[@class="li list-1"]/a/text()')) # print(html.xpath('//li[contains(@class,"li")]/a/text()')) print(html.xpath('//li[last()]/a/text()')) print(html.xpath('//li[1]/a/text() ') # get the text information in the a tag in the first li tag print(html.xpath('//li[position()<3]/a/text()')) print(html.xpath('//li[last()-2]/a/text() ') # the third to last print(html.xpath('//li[@class="lteml"][2]/a/text()'))
3. json module
1. Concept
Objects and numbers in javascript
The data in the object: {"key": "value"} json must be represented in double quotation marks
Array: [x1,x2,x3]
2. Role of json module
Conversion between json formatted string and python data type
3. Read json
json.loads() :
Function: json format - > Python data type
json python
Object dictionary
Array list
import json str=""" [{"name":"Xiao Ming"},{"age":"20"},{"sex":"male"}] """ print(type(str)) data=json.loads(str) #Convert json information to python data type print(data) print(type(data)) print(data[0]["name"]) print(data[1].get("age")) print(data[0].get("address","Beijing")) #If the information does not exist, the default value is returned
4. Output json
json.dumps()
Function: python data type ------ > JSON format
python json
Dictionary object
List array
Tuple array
be careful:
1,json. Dump () uses ascii encoding by default
2. Add guarantee_ ascii = false, disable ascii encoding
with open("data.json","w",encoding="utf-8") as f: f.write(json.dumps(data,indent=2,ensure_ascii=False)) #The indent parameter indicates the number of indented characters
If there is Chinese, you need to specify the code first, and then ensure_ The ASCII parameter is set to False
4. The process of crawling dynamic ajax request web pages
1. Analyze the request rules of the web page, open the review element, select the xhr option, view the change law of the web page request through the rolling pulley, and find the request information of js
2. After finding the request information, select the request, enter the request body, select the preview option, and check whether there is the data we want in the returned information
3. If the desired data exists in the returned information, select the Headers option, check the RequestURL parameter, find the real url of the web page, and then query the parameters. Generally, the parameters of the url exist in the Query Sring Parameters in the Headers option
4. By splicing the public part and parameters of the real url, we can get the url that can get more information, and then get the json information in js
Get download address json["data"]["items"] Then traverse for i in json["data"]["items"]: i["item"]["video_playurl"]
import requests import json import string import time import random class BilibiliVideo(): def __init__(self): self.url="http://api.vc.bilibili.com/board/v1/ranking/top? "# dynamic request's public url part self.headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"} self.all_str=string.punctuation+string.whitespace #Bind all special characters and white space characters def get_json(self): for offset in range(1,52,10): #Multiple fetches are realized by changing the value of offset in a loop params={"page_size":"10", "next_offset":str(offset), "tag":"Small video", "type":"tag_general_week", "platform":"pc"} html=requests.post(self.url,data=params,headers=self.headers).text #Return web page data (json) html=json.loads(html) #Turn json information into data types in python self.get_video_url(html) #Call the function to get the video link def get_video_url(self,html): #Function to obtain small video download address for video in html["data"]["items"]: video_url=video["item"]["video_playurl"] #Get the download address of the video video_name=video["item"]["description"] #Get the name of the video for char in video_name: #Traverse the characters in the video name if char in self.all_str: #If there are special characters video_name=video_name.replace(char,"") #If it exists, replace the special characters with blanks if len(video_name)>=50: video_name=video_name[:51] #If the length of the video name is greater than 50, slice the name and keep only the first 50 bits filename=video_name+".mp4" #Save file in mp4 format video_content=requests.get(video_url,headers=self.headers).content #Initiate a request for the video address and obtain binary information with open(filename,"wb") as f: f.write(video_content) print("%s Download successful"%filename) time.sleep(random.randint(1,5)) bilibili=BilibiliVideo() bilibili.get_json()