crawler - Programmer Think - where programmers share thinking

crawler

python crawls Tencent videos through crawling requests

The first time I wrote a blog, my original intention was to share my knowledge with you. Also pour down their own, uncomfortable and happy in the process. If you want to see the process, just look at it. What you don't want to see can also be directly at the bottom? Take the code away. Cause: it's because my girlfriend and another friend who i ...

Posted by Venkatesh on Tue, 08 Mar 2022 20:00:11 +0100

You can also make mistakes by splicing a URL and writing a fart crawler

In the process of writing crawler, we often need to parse the list page of the website. For example, the following example: <html> <head> <meta charset="utf-8"> <title>Test relative path</title> </head> <body> <div> <h1>List of books</h1&gt ...

Posted by adamh91 on Tue, 08 Mar 2022 17:57:47 +0100

How to replace the query field in the URL when Python crawler?

When we write a crawler, we may need to generate a new url based on the current url in the crawler. For example, the following pseudo code: import re current_url = 'https://www.kingname.info/archives/page/2/' current_page = re.search('/(\d+)', current_url).group(1) next_page = int(current_page) + 1 next_url = re.sub('\d+', str(next_page), curr ...

Posted by simanta on Tue, 08 Mar 2022 17:43:46 +0100

Basic explanation of Python crawler: data persistence -- Introduction to json and CSV modules

json Purpose: Encodes Python objects into JSON strings and decodes JSON strings into Python objects. The JSON module provides an API to convert Python objects in memory into JSON sequences. JSON has the advantage of being implemented in multiple languages, especially JavaScript. It is widely used in the communication between Web server and c ...

Posted by bam2550 on Tue, 08 Mar 2022 13:50:15 +0100

Python crawler actual combat + data analysis + data visualization (car home)

With the development of economy and the progress of science and technology, the car has become a necessary means of transportation for every family. In addition, the prerequisite for marriage is to have a car and a house, which virtually aggravates the pressure of male compatriots. At this time, we need a car urgently. The second-hand car marke ...

Posted by Base on Mon, 07 Mar 2022 21:07:56 +0100

Requests cache improves crawler efficiency

When we are reptiles, we often these situations: The website is complex and will encounter many repeated requests. Sometimes the crawler is interrupted unexpectedly, but we don't save the crawling state. If we run again, we need to crawl again. There are such problems. So how to solve these repeated crawling problems? You probably think of ...

Posted by gavinandresen on Mon, 07 Mar 2022 09:42:37 +0100

Python crawler notes

As a self-study note, this article is for reference only Learning course: IT of Feixue City, station B Reptiles: Use the program to obtain resources on the Internet. robots.txt protocol: specifies which data in the website cannot be crawled. It is only a protocol, but it does not prevent malicious crawling General steps of crawler: Ge ...

Posted by Mig on Sun, 06 Mar 2022 12:31:28 +0100

Use the crawler to crawl the articles and problems recently published on your csdn home page

catalogue Code: Summary: Code: import requests from bs4 import BeautifulSoup import re def getHTML(url): headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36 Edg/98.0.1108.56" } try: r=requests.get(url,headers=headers) ...

Posted by lucianbeauty on Sat, 05 Mar 2022 02:56:28 +0100

Python little crawler collaborative process crawler quickly get started

preface Crawler is a good thing. I want to use it recently, so I'll send out the previous little things and write a few blogs~ Synergetic process First of all, it is clear that the thread is not multi-threaded, and the thread is still a single thread in essence. However, the characteristic of this thread is that when the current thread e ...

Posted by adamh91 on Sat, 05 Mar 2022 01:05:18 +0100

The use of selenium in reptiles

The use of selenium in reptiles I selenium overview 1.1 definitions Selenium is a Web automated testing tool, which was originally developed for website automated testing. Selenium can directly call the browser. It supports all mainstream browsers (including PhantomJS, which has no interface). It can receive instructions and let the browser ...

Posted by Eskimo887 on Thu, 03 Mar 2022 09:21:58 +0100

Hot Topics