[Baidu cloud search, search for various information: http://www.lqkweb.com]
[Search the Web Disk for all kinds of information: http://www.swpan.cn]
Requests request
The Requests request is the Requests() method we write in the crawler file, which is to submit a request address. The Requests request is our custom**
Requests() method submits a request
Parameters:
url = string type url address
Callback = callback function name
method = string type request mode, if GET,POST
headers = dictionary type, browser user agent
Cookies = set cookies
meta = dictionary-type key-value pairs, passing a specified value directly to the callback function
Enoding = Setting up Web Coding
Priority = defaults to 0, and the higher the settings, the higher the priority of scheduling
dont_filter = defaults to False, which filters out the current url if set to true
# -*- coding: utf-8 -*- import scrapy from scrapy.http import Request,FormRequest import re class PachSpider(scrapy.Spider): #To define reptiles, you must inherit scrapy.Spider name = 'pach' #Set the crawler name allowed_domains = ['www.luyin.org/'] #Crawling Domain Names # start_urls = [''] #Crawling Web addresses is only suitable for requests that do not require login, because cookie s and other information cannot be set up. header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0'} #Setting up browser user agent def start_requests(self): #The start url function replaces start_urls """The first time you request the login page, set it to open cookie Make it get cookie,Setting callback function""" return [Request( url='http://www.luyin.org/', headers=self.header, meta={'cookiejar':1}, #Open Cookies record and pass Cookies to callback function callback=self.parse )] def parse(self, response): title = response.xpath('/html/head/title/text()').extract() print(title)
Response response
Response response is the response returned by downloader
Response response parameters
headers return the response header
Is status returned
body returns page content, byte type
The url returns the crawl url
# -*- coding: utf-8 -*- import scrapy from scrapy.http import Request,FormRequest import re class PachSpider(scrapy.Spider): #To define reptiles, you must inherit scrapy.Spider name = 'pach' #Set the crawler name allowed_domains = ['www.luyin.org/'] #Crawling Domain Names # start_urls = [''] #Crawling Web addresses is only suitable for requests that do not require login, because cookie s and other information cannot be set up. header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0'} #Setting up browser user agent def start_requests(self): #The start url function replaces start_urls """The first time you request the login page, set it to open cookie Make it get cookie,Setting callback function""" return [Request( url='http://www.luyin.org/', headers=self.header, meta={'cookiejar':1}, #Open Cookies record and pass Cookies to callback function callback=self.parse )] def parse(self, response): title = response.xpath('/html/head/title/text()').extract() print(title) print(response.headers) print(response.status) # print(response.body) print(response.url)