Download 1000 apk s (with hands-on materials) in Python UU

Posted by indian98476 on Mon, 06 Apr 2020 10:43:51 +0200

When we were testing Android phones,
The marketing department wants our testing department to test the compatibility of Top 1000 app s,
To ensure that our mobile phones can be installed and run normally with so many easy-to-use app s,
And the marketing department provides the apk download address of top 1000 in an application market.

How to download apk files in batches quickly?


Preparation stage
  1. wget command, requests module and urllib module can download files
  2. The above url in excel clearly needs to be redirected, because it is not a link at the end of. apk, we need to parse it before redirection.
  3. wget doesn't support this kind of parsing, so it can't be used. Because wget is a command, it's relatively limited and can't be reprogrammed, so we still use the requests module to download.
  4. The key point is how to achieve fast download, which requires the use of multi-threaded technology.
  5. In general, a queue is used for multithreading. If there is any data in the queue, run according to the specified number of threads (such as 10 threads).

Python batch script form -- single thread writing

Remember the essence of batch scripts: execute statements in batch order,
Because the batch script can only download a single apk, we use the requests module to download.

# coding=utf-8

import os
import requests
import openpyxl

curdir = os.getcwd() # Get current work directory
header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36'}

# Create a folder to hold the downloaded apk
if not os.path.exists("downloaded_apk"):
    os.system("mkdir downloaded_apk")

# Read the download address url in excel line by line
excel = openpyxl.load_workbook('Top_1000_app.xlsx')  # Read the contents of excel
table = excel.active
rows = table.max_row
for r in range(2, rows + 1):  # It has nothing to do with the title line of the first line of excel, starting from the text content of the second line
    apk_name = table.cell(row=r, column=2).value  # Get app name (Chinese)
    apk_url = table.cell(row=r, column=3).value  # Get download address
    save_path = os.path.join(curdir, "downloaded_apk", "%s.apk" % apk_name)
    if not os.path.exists(save_path):  # Avoid second Downloads
        print("Downloading the %sth apk and will save to %s" % (r, save_path))
        try:
            r = requests.get(apk_url, headers=header, allow_redirects=True, timeout=720)  # Initiate requests to download
            status_code = r.status_code
            if (status_code == 200 or status_code == 206):
                with open(save_path, "wb") as hf:
                    hf.write(r.content)
        except:
            print("Error, can not download %s.apk" % apk_name)
    else:
        print("%s downloaded already!" % save_path)

os.system("pause")

Python object-oriented class form -- writing method of multi thread Download

Multithreading is relatively difficult to understand,
Generally, tasks are put in the Queue, first in, first out,
Then, as long as the queue is not empty, Q ﹣ jobs are taken from the queue, and there are 10 threads at the same time.

#coding=utf-8

import os
import queue
import threading
import requests
import openpyxl

curdir = os.getcwd()  #Get current work directory
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36'}

# create folder
if not os.path.exists("downloaded_apk"):
    os.system("mkdir downloaded_apk")


def download_single_apk(apk_url_str):
    '''Download single apk file'''
    apk_name, apk_url = apk_url_str.split(";")
    # print(apk_url)
    save_path = os.path.join(curdir, "downloaded_apk", "%s.apk" % apk_name)
    if not os.path.exists(save_path):  # Avoid second Downloads
        print("Downloading  %s" % (save_path))
        try:
            r = requests.get(apk_url, headers=header, allow_redirects=True, timeout=720)  # Initiate requests to download
            status_code = r.status_code
            if (status_code == 200 or status_code == 206):
                with open(save_path, "wb") as hf:
                    hf.write(r.content)
        except:
            print("Error, can not download %s.apk" % apk_name)
    else:
        print("%s downloaded already!" % save_path)


###Threads for bulk download
class DownLoadThread(threading.Thread):
    def __init__(self, q_job):
        self._q_job = q_job
        threading.Thread.__init__(self)

    def run(self):
        while True:
            if self._q_job.qsize() > 0:
                download_single_apk(self._q_job.get())  # This is a download function run by all 10 threads
            else:
                break


if __name__ == '__main__':
    # Initializing a queue
    q = queue.Queue(0)
    
    # Read the url in excel line by line
    excel = openpyxl.load_workbook('Top_1000_app.xlsx')  # Read the contents of excel
    table = excel.active
    rows = table.max_row
    for r in range(2, rows + 1):  # It has nothing to do with the title line of the first line of excel. Start with the text content of the second line
        apk_name = table.cell(row=r, column=2).value  # Get app name (Chinese)
        apk_url = table.cell(row=r, column=3).value  # Get download address
        temp_str = apk_name + ";" + apk_url  # You can't put the list into the queue, you can only try put string
        q.put(temp_str)  
    
    for i in range(10):  # Open 10 threads
        DownLoadThread(q).start()

Download the training materials of this case

Jump to download materials on the official website of selfie tutorial
Produced by Wu Sanren, please download and use it at ease.

Operation mode and effect

For example, save the above code as download apk.py and put it on the desktop,
It is recommended to run Python download apk.py, or double-click it.
The operation effect is as follows:


For more and better original articles, please visit the official website: www.zipython.com
Selfie course (Python course of automatic test, compiled by Wu Sanren)
Original link: https://www.zipython.com/#/detail?id=32fc6017b5e14784a862c95367967ebd
You can also follow the wechat subscription number of "wusanren" and accept the article push at any time.

Topics: Python Excel Windows Android