Crawl the whole website [advanced notes of crawler]
From crawling one page of data to crawling all data
Let's talk about the general process of static web crawler first
Data loading method
By clicking on the second page, you can find that there are more at the back of the website? start=25 field
This part is called "query string". The query string is transmitted to the server as a ...
Posted by themistral on Sun, 06 Mar 2022 08:20:50 +0100
No audio material for video creation? It only needs 16 lines of Python code, so you can't use it all. The steps are very detailed
preface
As a new generation of contemporary youth, should they be more or less able to make short videos? Haha, what about the creators of contemporary we media~
When making videos, how much do you need some funny sounds? Or strange sounds? Music, etc~
How slow it is to download one by one, let's use python to realize batch download tod ...
Posted by hhstables on Wed, 23 Feb 2022 15:03:01 +0100
Visualization | analyze nearly 5000 tourist attractions in Python and tell you where to go during the holiday
Hello, I'm Ou K.
The May Day holiday is coming. There is plenty of time for welfare (five-day holiday) this year. How do you want to play such a long holiday?
Play like this?
Still playing like this?
In this issue, we will briefly analyze the distribution of popular scenic spots and national travel in China through the sales of ti ...
Posted by malam on Sat, 19 Feb 2022 04:14:51 +0100
Crawling dynamic data - simulation browser (Selenium introduction to actual combat)
catalogue
First, simulate the browser's environmental preparation.
1. Introduction to selenium
2.Selenium installation
3. Install WebDriver
(1) Installing chromedriver
(2) Add chromedriver to environment variable
2, Example: use Selenium operation browser to obtain QQ music list data
1. Start the browser
2. Use webdriver to open qq mus ...
Posted by GremlinP1R on Thu, 17 Feb 2022 21:50:25 +0100
python_ Crawler 04 requests Library
catalogue
1, Installation and documentation address
2, Send GET request
Add headers and query parameters
response.text and response Difference of content
3, Send POST request
4, Use agent
5, Cookies
6, session
7, Handling untrusted SSL certificates
Although the urllib module in Python's standard library already contains most of th ...
Posted by leocon on Tue, 01 Feb 2022 16:40:48 +0100
python_ Download middleware of crawler 21 Scrapy framework
catalogue
Downloader Middleware
1, process_request(self, request, spider)
2, process_response(self, request, response, spider)
3, Random request header Middleware
setting.py
middlewares.py
httpbin.py
4, ip proxy pool Middleware
1. Purchasing agent
2. Using ip proxy pool
3. Exclusive agent pool
Downloader Middleware
Downloader ...
Posted by Tjorriemorrie on Tue, 01 Feb 2022 09:16:35 +0100
python web crawler - data storage
Data storage
Two data storage methods are mainly introduced:
Stored in files, including text files and csv files
Stored in databases, including MySQL relational database and mongoDB database
Store to txt
title = "First text"
# W create write W + create read + Write
# r read r + read + Write
# A write a + read write attach
with open(r'C:\Users ...
Posted by WickedStylis on Sat, 29 Jan 2022 06:36:12 +0100
selenium simulates Ctrip Travel automatic login
Ctrip's automatic login is still a little troublesome. Let's look at the official website first:
Needless to say, you must locate the label first, locate it in the red box, jump through click(), and come to the following page:
Here, first locate the tag to the place where you enter the user name and password, and then use send_keys() can ...
Posted by kr9091 on Wed, 26 Jan 2022 12:10:16 +0100
Crawling Baidu translation (can be translated into Chinese and English)
Due to an introductory course in Python next semester
So I've been groping for myself during the winter vacation. After all, I can't drop out at that time. It's also a water credit
On a whim recently, I plan to try climbing Baidu translation
After a day's work, the liver finally came out
Don't talk too much and start it directly (the environmen ...
Posted by Dark_AngeL on Tue, 25 Jan 2022 01:41:41 +0100
[Python from zero to one] ten Selenium crawls online encyclopedia knowledge in ten thousand words (necessary skills for NLP corpus construction)
Welcome to "Python from zero to one", where I will share about 200 Python series articles, take you to learn and play, and see the interesting world of Python. All articles will be explained in combination with cases, codes and the author's experience. I really want to share my nearly ten years of programming experience with you. I ho ...
Posted by qaokpl on Mon, 24 Jan 2022 09:25:56 +0100