Target site audio novels - audio books - radio broadcast online listening - Dragonfly FM
The actual combat of python crawler is on the PC side. The mobile side may have a more convenient interface. You are welcome to pay attention to the discussion. Anyway, if you are practicing, just grab the package on the PC side
primary coverage
- 1.post request login
- 2. Simple use of hmacmd5 algorithm
The login of this example is very simple. Just post without any encryption. There is really no encryption and unknown parameters
Many people learn Python and don't know where to start. Many people study python,After mastering the basic grammar, I don't know where to find cases. Many people who have been able to learn cases do not know how to learn more advanced knowledge. So for these three types of people, I will provide you with a good learning platform, free access to video tutorials, e-books, and the source code of the course! QQ Group:101677771 Welcome to join us, discuss and study together
data:image/s3,"s3://crabby-images/c3e0e/c3e0eb7a6988f3cfcbfc2da638c4278ecdce150e" alt=""
python implementation. Note that this is a method of a class. It is incomplete and cannot be run directly
def login(self,user_id,password): data = { 'account_type': '5', 'device_id': 'web', 'user_id': user_id, 'password': password } response = self.session.post(self.login_url,data=data) if response.status_code==200: temp = response.json() errorno = temp['errorno'] errormsg = temp['errormsg'] if errorno == 0: print('login successful!','Login succeeded!') data = temp['data'] self.qingting_id = data['qingting_id'] self.access_token = data['access_token'] else: print('Login failed','Login failed') print(errormsg)
After recording successfully, we put access_token and qingting_id is equivalent to a sign after login. If the account is a member, it is equivalent to a member sign
The real address of the audio requests such a url:
https://audio.qingting.fm/audiostream/redirect/294280/11604885
Where 294280 is the album id,
11604885 is the id of the current audio
data:image/s3,"s3://crabby-images/9058e/9058e116beb003d492c973399ac9d7e678b45964" alt=""
It also takes some parameters, such as access_ token,qingting_ ID (in the response for successful login, all the responses not logged in in the figure above are empty), and some others, such as t, are time stamps,
device_id=MOBILESITE (unchanged)
The key is sign (if you try without sign, you will return a signature error)
You can try which js generated this sign through global search. I searched globally
device_id
In mian a load of. js # found the function that generates the sign (you need to distinguish it by yourself. It is a device_id: "MOBILESITE")
Search for other keywords should also be able to find smoothly
data:image/s3,"s3://crabby-images/3ebaf/3ebaffcf6434abf2eaaa8315086f129db141e4f1" alt=""
The sign here is the variable u, which is obtained from the variable c through a pile of encryption processing
We can output u and c from the console
data:image/s3,"s3://crabby-images/ce04b/ce04b9892d5e17a83b3dc98d9169049f832efb46" alt=""
So we know that sign actually encrypts other parameters of the request
At first, I mistakenly thought it was pure MD5, so I stuck it for a long time (I also went into the function to see how it was implemented)
In fact, the code has told you to use
createHmac("md5", "fpMn12&38f_2e")
After checking Hmac, it is found that it is a ready-made algorithm, and there are different modes. MD5 is one of them, and a secret key is required
I've told you everything here. The secret key of Hmac-md5 is fpmn12 & 38F_ 2e
Find an online encrypted website and try it. It's the same as the console output just now
data:image/s3,"s3://crabby-images/ec347/ec347f9fd2117d4515cd337bd12c841922e10307" alt=""
import is required for python
hmac this library
import hmac import time base_url = "https://audio.qingting.fm" bookid = "294280" id = "11590788" access_token = "" qingting_id ="" timestamp = str(round(time.time()*1000)) data = f"/audiostream/redirect/{bookid}/{id}?access_token={access_token}&device_id=MOBILESITE&qingting_id={qingting_id}&t={timestamp}" message = data.encode('utf-8') key = "fpMn12&38f_2e".encode('utf-8') sign = hmac.new(key, message, digestmod='MD5').hexdigest() whole_url = base_url+data+"&sign="+sign print(whole_url)
You can get an audio. The rest is to get a pile. In fact, we can get the id of each audio
This is the interface I requested
info_api = 'https://i.qingting.fm/capi/channel/{self.bookid}/programs/{self.version}?curpage={str(page)}&pagesize=30&order=asc'
version is in the source code of the sound book home page. You can turn the page as long as you change curpage
import requests import re import hmac import time from tqdm import tqdm from bs4 import BeautifulSoup import os import json import sys import urllib3 urllib3.disable_warnings() class QingTing(): def __init__(self,user_id,password,bookurl,ifLogin): self.ifLogin = ifLogin self.user_id = user_id self.password = password self.session = requests.session() self.session.headers.update({'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}) self.login_url = "https://u2.qingting.fm/u2/api/v4/user/login" self.qingting_id = '' self.access_token = '' self.bookurl = bookurl # self.bookurl = 'https://www.qingting.fm/channels/257790' self.bookid = self.bookurl.split('/')[-1] self.version = '' self.qingtinghost = 'https://audio.qingting.fm' self.save_path = '' self.bookname = '' def login(self,user_id,password): data = { 'account_type': '5', 'device_id': 'web', 'user_id': user_id, 'password': password } response = self.session.post(self.login_url,data=data,verify=False) if response.status_code==200: temp = response.json() errorno = temp['errorno'] errormsg = temp['errormsg'] if errorno == 0: print('login successful!','Login succeeded!') data = temp['data'] self.qingting_id = data['qingting_id'] self.access_token = data['access_token'] else: print('Login failed','Login failed') print(errormsg) time.sleep(10) sys.exit(0) def __get_version(self): response = self.session.get(url=self.bookurl,verify=False) if response.status_code==200: soup = BeautifulSoup(response.text,'lxml') temp_bookname = soup.select('div.album-info-root > div.top > div.info.right > h1')[0].string replaced_pattern = '[\\\/:\*\?\"<>|]' self.bookname = re.sub(replaced_pattern,' ',temp_bookname,flags=re.M +re.S) if not os.path.exists(self.bookname): os.makedirs(self.bookname) matched = re.search('\"version\":\"(\w+)"',response.text,re.S) if matched: version = matched.group(1) self.version = version # return version def __get_total_page(self): self.__get_version() page = 1 info_api = f'https://i.qingting.fm/capi/channel/{self.bookid}/programs/{self.version}?curpage={str(page)}&pagesize=30&order=asc' response = self.session.get(info_api,verify=False) if response.status_code==200: temp = response.json() total = temp['data']['total'] total_page = int(int(total)/30)+1 return total,total_page def get_book_info(self): total,total_page = self.__get_total_page() print(self.bookname,'common{}collection'.format(total)) for page in range(1,total_page+1): info_api = f'https://i.qingting.fm/capi/channel/{self.bookid}/programs/{self.version}?curpage={str(page)}&pagesize=30&order=asc' response = self.session.get(info_api,verify=False) programs = response.json()['data']['programs'] for program in programs: # print(program['id'],program['title']) yield program def get_src(self,id): bookid = self.bookid access_token = self.access_token qingting_id =self.qingting_id timestamp = str(round(time.time()*1000)) data = f"/audiostream/redirect/{bookid}/{id}?access_token={access_token}&device_id=MOBILESITE&qingting_id={qingting_id}&t={timestamp}" message = data.encode('utf-8') key = "fpMn12&38f_2e".encode('utf-8') sign = hmac.new(key, message, digestmod='MD5').hexdigest() whole_url = self.qingtinghost+data+"&sign="+sign return whole_url def downloadFILE(self,url,name): resp = self.session.get(url=url,stream=True,verify=False) if resp.headers['Content-Type'] =='audio/mpeg': content_size = int(int(resp.headers['Content-Length'])/1024) with open(name, "wb") as f: print("Pkg total size is:",content_size,'k,start...') for data in tqdm(iterable=resp.iter_content(1024),total=content_size,unit='k',desc=name): f.write(data) print(name , "download finished!") else: errorno = resp.json()['errorno'] errormsg = resp.json()['errormsg'] print('No permission to download,Please log in to the account that has purchased this audio.') print('errorno:',errorno,errormsg) def run(self): if self.ifLogin: self.login(self.user_id,self.password) programs = self.get_book_info() count = 0 for program in programs: count+=1 try: id = program['id'] title = str(count).zfill(4)+' '+program['title']+'.m4a' if not self.bookname =='': title = os.path.join(self.bookname,title) whole_url = self.get_src(id) self.downloadFILE(whole_url,title) except Exception as e: print(e) with open('log.txt','a',encoding='utf-8') as f: f.write(str(count)+str(e)+'\n') def get_config_info(): with open('config.json','r',encoding='utf-8') as f: config = json.loads(f.read()) return config if __name__ == "__main__": # pyinstaller -F -i ico.ico QingTingFM.py config = get_config_info() if config["ifLogin"]: bookurl = input('Please enter a link to the home page where you want to download audio:(as[url=https://www.qingting.fm/channels/257790]https://www.qingting.fm/channels/257790[/url])') isvalid = re.search('https://www.qingting.fm/channels/\d+',bookurl) if isvalid: q = QingTing(config["user_id"],config["password"],bookurl,1) q.run() else: print("The home page entered is not in the correct format") else: # Don't log in bookurl = input('Please enter a link to the home page where you want to download audio:(as[url=https://www.qingting.fm/channels/257790]https://www.qingting.fm/channels/257790[/url])') isvalid = re.search('https://www.qingting.fm/channels/\d+',bookurl) if isvalid: q = QingTing(config["user_id"],config["password"],bookurl,0) q.run() else: print("The home page entered is not in the correct format")
Profile section
{ "ifLogin":1, "user_id":"135########", "password":"pwd########" }