When we were testing Android phones,
The marketing department wants our testing department to test the compatibility of Top 1000 app s,
To ensure that our mobile phones can be installed and run normally with so many easy-to-use app s,
And the marketing department provides the apk download address of top 1000 in an application market.
How to download apk files in batches quickly?
Preparation stage
- wget command, requests module and urllib module can download files
- The above url in excel clearly needs to be redirected, because it is not a link at the end of. apk, we need to parse it before redirection.
- wget doesn't support this kind of parsing, so it can't be used. Because wget is a command, it's relatively limited and can't be reprogrammed, so we still use the requests module to download.
- The key point is how to achieve fast download, which requires the use of multi-threaded technology.
- In general, a queue is used for multithreading. If there is any data in the queue, run according to the specified number of threads (such as 10 threads).
Python batch script form -- single thread writing
Remember the essence of batch scripts: execute statements in batch order,
Because the batch script can only download a single apk, we use the requests module to download.
# coding=utf-8 import os import requests import openpyxl curdir = os.getcwd() # Get current work directory header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36'} # Create a folder to hold the downloaded apk if not os.path.exists("downloaded_apk"): os.system("mkdir downloaded_apk") # Read the download address url in excel line by line excel = openpyxl.load_workbook('Top_1000_app.xlsx') # Read the contents of excel table = excel.active rows = table.max_row for r in range(2, rows + 1): # It has nothing to do with the title line of the first line of excel, starting from the text content of the second line apk_name = table.cell(row=r, column=2).value # Get app name (Chinese) apk_url = table.cell(row=r, column=3).value # Get download address save_path = os.path.join(curdir, "downloaded_apk", "%s.apk" % apk_name) if not os.path.exists(save_path): # Avoid second Downloads print("Downloading the %sth apk and will save to %s" % (r, save_path)) try: r = requests.get(apk_url, headers=header, allow_redirects=True, timeout=720) # Initiate requests to download status_code = r.status_code if (status_code == 200 or status_code == 206): with open(save_path, "wb") as hf: hf.write(r.content) except: print("Error, can not download %s.apk" % apk_name) else: print("%s downloaded already!" % save_path) os.system("pause")
Python object-oriented class form -- writing method of multi thread Download
Multithreading is relatively difficult to understand,
Generally, tasks are put in the Queue, first in, first out,
Then, as long as the queue is not empty, Q ﹣ jobs are taken from the queue, and there are 10 threads at the same time.
#coding=utf-8 import os import queue import threading import requests import openpyxl curdir = os.getcwd() #Get current work directory header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36'} # create folder if not os.path.exists("downloaded_apk"): os.system("mkdir downloaded_apk") def download_single_apk(apk_url_str): '''Download single apk file''' apk_name, apk_url = apk_url_str.split(";") # print(apk_url) save_path = os.path.join(curdir, "downloaded_apk", "%s.apk" % apk_name) if not os.path.exists(save_path): # Avoid second Downloads print("Downloading %s" % (save_path)) try: r = requests.get(apk_url, headers=header, allow_redirects=True, timeout=720) # Initiate requests to download status_code = r.status_code if (status_code == 200 or status_code == 206): with open(save_path, "wb") as hf: hf.write(r.content) except: print("Error, can not download %s.apk" % apk_name) else: print("%s downloaded already!" % save_path) ###Threads for bulk download class DownLoadThread(threading.Thread): def __init__(self, q_job): self._q_job = q_job threading.Thread.__init__(self) def run(self): while True: if self._q_job.qsize() > 0: download_single_apk(self._q_job.get()) # This is a download function run by all 10 threads else: break if __name__ == '__main__': # Initializing a queue q = queue.Queue(0) # Read the url in excel line by line excel = openpyxl.load_workbook('Top_1000_app.xlsx') # Read the contents of excel table = excel.active rows = table.max_row for r in range(2, rows + 1): # It has nothing to do with the title line of the first line of excel. Start with the text content of the second line apk_name = table.cell(row=r, column=2).value # Get app name (Chinese) apk_url = table.cell(row=r, column=3).value # Get download address temp_str = apk_name + ";" + apk_url # You can't put the list into the queue, you can only try put string q.put(temp_str) for i in range(10): # Open 10 threads DownLoadThread(q).start()
Download the training materials of this case
Jump to download materials on the official website of selfie tutorial
Produced by Wu Sanren, please download and use it at ease.
Operation mode and effect
For example, save the above code as download apk.py and put it on the desktop,
It is recommended to run Python download apk.py, or double-click it.
The operation effect is as follows:
For more and better original articles, please visit the official website: www.zipython.com
Selfie course (Python course of automatic test, compiled by Wu Sanren)
Original link: https://www.zipython.com/#/detail?id=32fc6017b5e14784a862c95367967ebd
You can also follow the wechat subscription number of "wusanren" and accept the article push at any time.