python crawls 4K ultra clear picture quality mobile phone wallpaper. Of course, the more wallpapers, the better~

Posted by jacko310592 on Mon, 10 Jan 2022 11:21:05 +0100

preface

Everyone is familiar with mobile phone wallpaper. I believe whoever turns on his mobile phone wants his wallpaper to be his favorite picture,

But when a wallpaper is used for a long time, it will want to change a picture full of freshness (excluding those who love it),

However, the time of selecting pictures is always constant. Sometimes a long time of selection may not be able to select the one you like, and you are tired of fighting back

So how to write a code directly and automatically download the type of mobile phone wallpaper you are interested in

Contents of this meeting:

python crawls 4K ultra clear picture quality mobile phone wallpaper

development environment

Knock the code. You can't even knock the code 👀

  • Python 3.8
  • Pycharm

How to configure the python interpreter in pycharm?

  1. Select File > > > setting > > > Project > > > Python interpreter
  2. Click the gear and select add
  3. Add python installation path

How does pycharm install plug-ins?

  1. Select File > > > setting > > > plugins
  2. Click Marketplace and enter the name of the plug-in you want to install, such as translation / Chinese
  3. Select the appropriate plug-in and click Install
  4. After the installation is successful, the option to restart pycharm will pop up. Click OK and the restart will take effect

Module use

The first module needs to be installed. Otherwise, even if your code comes out, it will still be unhappy and report errors without the module. The other two modules are self-contained and do not need to be installed

  • Requests > > > PIP install requests data requests
  • re regular parsing data
  • os automatically creates folders

Many small partners do not know how to install the module or report the reasons for errors. Now let's explain it in detail

  • If you install a python third-party module:
  1. win + R enter cmd, click OK, enter the installation command pip install module name (pip install requests) and press enter
  2. Click terminal in pychart to enter the installation command
  • Installation failure reason:
    • Failure 1: pip is not an internal command
      Solution: set the environment variable

    • Failure 2: a large number of red flags (read time out) appear
      Solution: because the network link times out, you need to switch the image source
      Tsinghua University: https://pypi.tuna.tsinghua.edu.cn/simple
      Alibaba cloud: http://mirrors.aliyun.com/pypi/simple/
      China University of science and technology https://pypi.mirrors.ustc.edu.cn/simple/
      Huazhong University of Technology: http://pypi.hustunique.com/
      Shandong University of Technology: http://pypi.sdutlinux.org/
      Watercress: http://pypi.douban.com/simple/
      For example: pip3 install -i https://pypi.doubanio.com/simple/ Module name

    • Failure 3: cmd shows that it has been installed or successfully installed, but it still cannot be imported in pycharm
      Solution: you may have installed multiple python versions (anaconda or python can be installed). Just uninstall one
      Or you haven't set up the python interpreter in pycharm

We want to implement a crawler case

The first thing to do:

Is to analyze the data content we want and where we can get it (developer tools to capture and analyze)

Through analysis, you can know that you want to obtain the original wallpaper URL > > > obtain each wallpaper detail page URL > > > list page obtain the wallpaper detail page

The second step is our code implementation step

1. Send request, Send request for wallpaper list page
2. get data, Gets the value returned by the server response data
3. Parse data, Extract the wallpaper detail page we want url
4. Send request, For wallpaper details page url Send request
5. get data, Gets the value returned by the server response data
6. Parse data, Extract the pictures we want url And picture title
7. Save data, Save local
8. Multi page crawling, Send the request according to the change law of the wallpaper list page

Code display

Let's import our data request module first

import requests  # pip install requests

Importing regular expression modules

import re  # The built-in module does not need to be installed
for page in range(4, 11):
    print(f'===================Crawling to No{page}Data content of the page===================')
    #  1. Send a request for the wallpaper list page
    url = f'https://m.bcoderss.com/tag/%e5%8a%a8%e6%bc%ab/page/{page}/'
    # The purpose of headers request header is to disguise python code
    # If it is identified as a crawler, the program will not return data or other data
    headers = {
        'cookie': 'UM_distinctid=17dffa7aa3a189-048d4b23bef5a4-4303066-1fa400-17dffa7aa3bbfe; Hm_lvt_ce3020881c73bb20f0830ef4ed0a61fb=1640671718; CNZZDATA1278590218=1385918032-1640661155-%7C1640667485; Hm_lpvt_ce3020881c73bb20f0830ef4ed0a61fb=1640676345',
        'origin': 'https://m.bcoderss.com',
        'referer': url,
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
    }
    response = requests.post(url=url, headers=headers)
    # 2. Get the data and get the response data (text data) returned by the server
    # print(response.text)  # String type
    # 3. Parse the data and extract the url of the wallpaper detail page we want
    # Regular expression, you can get the response Textextract directly
    # Returns the list data type. For a list, I don't want its first element, operation
    # split() is a string method. The position of the slice list element is counted from 0
    # Through the findall method in the re module, the response Find all about < li > < a target = "_blank" href = "(. *?)" in text What we want is the data content in ()
    img_url_list = re.findall('<li><a target="_blank" href="(.*?)"', response.text)[1:]
    # Extract the elements in the list one by one
    for img_url in img_url_list:
        # Extract wallpaper details page url

        # 4. Send a request for the wallpaper detail page url
        response_1 = requests.get(url=img_url, headers=headers)
        # 5. Get the data and get the response data returned by the server
        # print(response_1.text) gets text data
        # 6. Analyze data
        img_info = re.findall('<img alt=".*?" title="(.*?)" src="(.*?)">', response_1.text)[0]
        img_name = img_info[0]
        new_title = re.sub(r'[\/:*?"<>|\n]', '', img_name)
        img_link = img_info[1]
        # 7. Save data, save local
        # To save picture data, get binary data
        img_content = requests.get(url=img_link, headers=headers).content  # response.content get binary data
        with open('img\\' + new_title + '.jpg', mode='wb') as f:
            f.write(img_content)
        print(img_name, img_link)

Effect display



Well, my article will end here!

There are more suggestions or questions to comment on or send me a private letter! Come on, let's work together

Like to pay attention to the blogger, or like to collect and comment on my article, Ba!!!

Topics: Python Programming Pycharm crawler