Twenty-first day of python learning (bag-grabbing tool)

Posted by SilentQ-noob- on Thu, 22 Aug 2019 15:26:13 +0200

Knowledge Point Catalogue
1. What is grabbing bag?
2. What are the commonly used package-grabbing software?
3. How to grab app on mobile phone?
4. How to grab the article information of a technical number of Wechat?
5. How to grab relevant requests from teachers and implement corresponding functions with code?
6... Familiar with mysql installation and navicate cracking
7. How to solve cookie failure during crawling?
8. How to use regularization to collect json data of web source code

1. What is grabbing bag?

packet capture is to intercept, retransmit, edit and save data packets transmitted and received by the network. It is also used to check network security. packet capture is also often used for data interception and so on.

2. What are the commonly used package-grabbing software?

Fiddler is a program that runs on windows and is used to capture HTTP, HTTPS.

wireshark can get HTTP and HTTPS, but it can't decrypt HTTPS, so wireshark can't understand the content of HTTPS.

Charles is actually a proxy server. It accesses the proxy server by setting itself up as a network of systems (computers or browsers), and then intercepts requests and requests to achieve the purpose of packet analysis. The software is written in Java and can be used in Windows, Mac and Linux. When installing Charles, you need to install the Java environment first.
Charles's main functions:
(1) Intercept Http and Https network packets.
(2) Supporting retransmitting network requests to facilitate back-end debugging.
(3) Support to modify network request parameters.
(4) Support the interception and dynamic modification of network requests.
(5) Supporting analog slow network.

3. How to grab app on mobile phone?

Grabbing Mobile app Configuration

# 1. Install Charles Fiddler
 # 2. Mobile phones and computers are connected in the same network
 # Open Hot Points for Mobile Phones
 # 3. Setting up the port and IP address to connect hotspots on mobile phones
 # 4. Open the app corresponding to the mobile phone and grab the data.

charlse Grab Mobile app Configuration
https://blog.csdn.net/h176nhx7/article/details/79236495

charles grabs https requests (by default, only http can be grabbed without configuring this https request)
https://www.cnblogs.com/fighter007/p/9162617.html

fiddler grabs https
https://www.cnblogs.com/joshua317/p/8670923.html

4. How to grab the article information of a technical number of Wechat?

import re
import requests
# This is the data captured by the package grabbing tool.
url = "https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=MzIxODM4MjA5MA==&scene=124&uin=Mjk0OTE4MjM4Mg%3D%3D&key=b3176bc64ae00a443cf14744a77cdb6159d0fa4b4d0a3035c47bfe42179336fc33415d97d68c88dfec86758a402e817fc4d1a6eee22d84d91d46dde7f692bb69baa126d7493316067f1bd4e0d2a9bd5c&devicetype=Windows+10&version=62060833&lang=zh_CN&a8scene=7&pass_ticket=3ItgczzTtwCsxt6Cl1f0SwP%2B2rIpyevYK3FtiIZOPFtq7qqsH%2BHdiDCKc8j9PXkD&winzoom=1"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36 QBCore/3.53.1159.400 QQBrowser/9.0.2524.400 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 MicroMessenger/6.5.2.501 NetType/WIFI WindowsWechat",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Encoding": "gzip, deflate",
    "Accept-Language": "zh-CN,zh;q=0.8,en-US;q=0.6,en;q=0.5;q=0.4",
    "Cookie": "wxuin=2949182382; devicetype=Windows10; version=62060833; lang=zh_CN; pass_ticket=3ItgczzTtwCsxt6Cl1f0SwP+2rIpyevYK3FtiIZOPFtq7qqsH+HdiDCKc8j9PXkD; wap_sid2=CK7no/4KElw4Z19OX1lRam1XdEUwTTF2Q0NOQ29kWVhJdXJNbGt0eFYteGZBdElUazhrUk5VRVlfQTJoQXhLQUFYUUpvMWwzUmQ0dzhBalUwVjdUMTRGcThvQzVVdjhEQUFBfjCTkfnqBTgNQJVO",
}

response = requests.get(url, headers=headers)
print(response.text)

# Paste response.text data into wx.html
f = open("wx.html", encoding="utf-8")
content = f.read()
f.close()

# Extracting needed data with regularization
pattern = re.compile("var msgList = '(.*?)'")
pattern = re.compile("var msgList = (.*);")
result = pattern.findall(content)
print(result)

5. How to grab relevant requests from teachers and implement corresponding functions with code?

# Grab the data of post request of user login by using grabbing tool, and then simulate login with code.
# 
# 1. Grab the post request data submitted by the teacher after login with the package grabbing tool
# 
# 2. Forms
params = """{"mobile":"18790868582","password":"7194be6e78a0ebdd3ca9d6626c2ad13b"}"""

# 3. Request header data
headers = {
    "Accept": "*/*",
    "Origin": "https://teacher.zhiyou888.com",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36",
    "Content-Type": "application/json",
    "Referer": "https://teacher.zhiyou888.com/login.html",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "zh-CN,zh;q=0.9",
}
url = "https://teacher.zhiyou888.com/api/common/user/login"

# 4.Implementing Login by Simulating Browser with Code
response = requests.post(url, headers=headers, data=params)
print(response.text)

#
url = "https://teacher.zhiyou888.com/api/course-focuses"
headers = {
    "Accept": "application/json, text/plain, */*",
    "Origin": "https://teacher.zhiyou888.com",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36",
    "Content-Type": "application/json;charset=UTF-8",
    "Referer": "https://teacher.zhiyou888.com/",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Cookie": "ZHIYOU_TEACHER=ZHIYOU:USER:TOKEN:add63aeb8628487eb7d01f3af4bdb7e0",
}

params = """{"date":"2019-08-22","classId":654,"comment":["hello world","hello"]}"""
# Adding course highlights (only once)
response = requests.post(url, data=params, headers=headers)
# Updating Courses
response = requests.put(url, data=params, headers=headers)
# Delete courses
response = requests.delete(url+"?classId=654&date=2019-08-22", headers=headers)
# Query Course
response = requests.get("https://teacher.zhiyou888.com/api/common/course-focuses/654?pageNo=1", headers=headers)
print(response.text)

6. How to activate navicate Professional Edition?

See Learning Notes from August 22, 2019

7. How to solve cookie failure during crawling?

1. Simple but not desirable. When the cookie fails, refresh the page, get a new cookie, and then update the code cookie.

2. Getting cookie s dynamically is difficult.

8. How to use regularization to collect json data of web source code

Just like regular matching other data, if you write a regular expression, you can match the data you need.

Topics: Mobile network Windows JSON