Wanfu welcomes the sprouting tiger, wanhu celebrates the Spring Festival and the year of the tiger in 2022

Posted by JParishy on Wed, 19 Jan 2022 15:31:39 +0100

preface

2022 is a auspicious year of the tiger. Welcome to the new year. In the new year, I hope to bring you good luck through "ten thousand tigers" and "ten thousand blessings". I wish you tiger and tiger vitality, smooth wind and smooth water, good life and rolling financial resources! Congratulations are over, and now we will highlight technology~

1, Crawling tiger

Through Python crawler technology, we can easily and quickly crawl from Baidu pictures to a large number of tiger and Fu pictures. Here's how to crawl:

1. Website analysis

First, we open Baidu picture, F12 open the console, then search for a "year of the tiger", and click the picture:

While sliding the mouse to load more pictures, we can view the contents output in the console and find that there are many data packets:

Select any one here and copy the URL request of this packet:

https://image.baidu.com/search/acjson?tn=resultjson_com&logid=11365856357363949053&ipn=rj&ct=201326592&is=&fp=result&fr=&word=%E8%99%8E%E5%B9%B4&queryWord=%E8%99%8E%E5%B9%B4&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=&copyright=&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&expermode=&nojc=&isAsync=&pn=120&rn=30&gsm=78&1642056713314=

Click this URL to see the parameters it carries, and copy this paragraph:

tn: resultjson_com
logid: 11365856357363949053
ipn: rj
ct: 201326592
is: 
fp: result
fr: 
word: Year of the tiger
queryWord: Year of the tiger
cl: 2
lm: -1
ie: utf-8
oe: utf-8
adpicid: 
st: -1
z: 
ic: 0
hd: 
latest: 
copyright: 
s: 
se: 
tab: 
width: 
height: 
face: 0
istype: 2
qc: 
nc: 1
expermode: 
nojc: 
isAsync: 
pn: 120
rn: 30
gsm: 78
1642056713314: 

So far, we have obtained the required code, and the analysis is over!

2. Crawl code

I won't describe the detailed steps of coding. Here are the main source codes:

import requests
import os
from lxml import etree
path = r"/Users/lpc/Downloads/baidu1/"
# Judge whether the directory exists, skip if it exists, and create if it does not exist
if os.path.exists(path):
    pass
else:
    os.mkdir(path)

page = input('Please enter how many pages to crawl:')
page = int(page) + 1
header = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}
n = 0
pn = 1
# pn is obtained from the first few pictures. When Baidu pictures decline, 30 are displayed at one time by default
for m in range(1, page):
    url = 'https://image.baidu.com/search/acjson?'

    param = {
        'tn': 'resultjson_com',
        'logid': '7680290037940858296',
        'ipn': 'rj',
        'ct': '201326592',
        'is': '',
        'fp': 'result',
        'queryWord': 'Year of the tiger',
        'cl': '2',
        'lm': '-1',
        'ie': 'utf-8',
        'oe': 'utf-8',
        'adpicid': '',
        'st': '-1',
        'z': '',
        'ic': '0',
        'hd': '1',
        'latest': '',
        'copyright': '',
        'word': 'Year of the tiger',
        's': '',
        'se': '',
        'tab': '',
        'width': '',
        'height': '',
        'face': '0',
        'istype': '2',
        'qc': '',
        'nc': '1',
        'fr': '',
        'expermode': '',
        'nojc': '',
        'acjsonfr': 'click',
        'pn': pn,  # Which picture to start with
        'rn': '30',
        'gsm': '3c',
        '1635752428843=': '',
    }
    page_text = requests.get(url=url, headers=header, params=param)
    page_text.encoding = 'utf-8'
    page_text = page_text.json()
    print(page_text)
    # First take out the dictionary where all links are located and store it in a list
    info_list = page_text['data']
    # Since the last dictionary retrieved in this way is empty, the last element in the list is deleted
    del info_list[-1]
    # Define a list for storing picture addresses
    img_path_list = []
    for i in info_list:
        img_path_list.append(i['thumbURL'])
    # Then take out all the picture addresses and download them
    # n will be the name of the picture
    for img_path in img_path_list:
        img_data = requests.get(url=img_path, headers=header).content
        img_path = path + str(n) + '.jpg'
        with open(img_path, 'wb') as fp:
            fp.write(img_data)
        n = n + 1

    pn += 29

The above methods can achieve crawling Baidu pictures, but it is troublesome and not intelligent to analyze crawling every time. Therefore, share another source code that can be crawled only by entering the keyword "year of the tiger" after running:

# -*- coding:utf-8 -*-
import requests
import re, time, datetime
import os
import random
import urllib.parse
from PIL import Image  # Import a module

imgDir = r"/Volumes/DBA/python/img/"
# Set headers to prevent anti pickpocketing, set multiple headers
# chrome,firefox,Edge
headers = [
    {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
        'Connection': 'keep-alive'
    },
    {
        "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0',
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
        'Connection': 'keep-alive'
    },
    {
        "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19041',
        'Accept-Language': 'zh-CN',
        'Connection': 'keep-alive'
    }
]

picList = []  # Empty List of stored pictures

keyword = input("Please enter a keyword to search for:")
kw = urllib.parse.quote(keyword)  # transcoding 


# Get 1000 thumbnail list s searched by Baidu
def getPicList(kw, n):
    global picList
    weburl = r"https://image.baidu.com/search/acjson?tn=resultjson_com&logid=11601692320226504094&ipn=rj&ct=201326592&is=&fp=result&queryWord={kw}&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&hd=&latest=&copyright=&word={kw}&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn={n}&rn=30&gsm=1e&1611751343367=".format(
        kw=kw, n=n * 30)
    req = requests.get(url=weburl, headers=random.choice(headers))
    req.encoding = req.apparent_encoding  # Prevent Chinese garbled code
    webJSON = req.text
    imgurlReg = '"thumbURL":"(.*?)"'  # regular
    picList = picList + re.findall(imgurlReg, webJSON, re.DOTALL | re.I)


for i in range(150):  # The number of cycles is relatively large. If there are not so many graphs, the picList data will not increase.
    getPicList(kw, i)

for item in picList:
    # Suffix and first name
    itemList = item.split(".")
    hz = ".jpg"
    picName = str(int(time.time() * 1000))  # Millisecond timestamp
    # Request picture
    imgReq = requests.get(url=item, headers=random.choice(headers))
    # Save picture
    with open(imgDir + picName + hz, "wb") as f:
        f.write(imgReq.content)
    #  Open picture with Image module
    im = Image.open(imgDir + picName + hz)
    bili = im.width / im.height  # Get the width height ratio, and adjust the picture size according to the width height ratio
    newIm = None
    # Resize the picture with the smallest side set to 50
    if bili >= 1:
        newIm = im.resize((round(bili * 50), 50))
    else:
        newIm = im.resize((50, round(50 * im.height / im.width)))
    # Intercept the 50 * 50 part of the picture
    clip = newIm.crop((0, 0, 50, 50))  # Intercept the picture and crop it
    clip.convert("RGB").save(imgDir + picName + hz)  # Save the captured picture
    print(picName + hz + " Processing completed")

Demonstrate this method. After running, enter "year of the tiger" and press enter to wait for the download to complete:

The above is the source code of Baidu pictures. After crawling "Fu" and "tiger" respectively, we can start wantu imaging!

2, Wantu imaging

Wantu imaging is very simple. I wrote a similar article before and introduced it: Python crawls cat pictures in batches to realize thousand image imaging , you can refer to it!

design sketch

Now upload the renderings we have assembled directly:

3, I use SQL to write Fu characters

The SQL is as follows:

create table LuciferFu(fu_line varchar2(128));
insert into LuciferFu values('22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222'),
('22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222'),
('22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222'),
('22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222'),
('2222222222222222222222222222222222222222222222222222222222222222222222222/[[    .\2222222222222222222222222222222222'),
('2222222222222222222222222222222222222222222222222222222222222222222/[.              ,2222222222222222222222222222222'),
('222222222222222222222222/2[222222222222222222222222222222222222`                      \22222222222222222222222222222'),
('222222222222222222222222^                ,\2222222222222222[`                          =2222222222222222222222222222'),
('2222222222222222222222222                   \22222222/2[                               \2222222222222222222222222222'),
('2222222222222222222222222                     22222\                                  ]22222222222222222222222222222'),
('2222222222222222222222222^                     \2222                             .]/22222222222222222222222222222222'),
('22222222222222222222222222                      2222`                 .,,]]/2222222222222222222222222222222222222222'),
('222222222222222222222222222                     \2222]  ,/2/,/222222222222222222222222222222222222222222222222222222'),
('2222222222222222222222222222                    .22222222`/222222222222222222222222222222222222222222222222222222222'),
('22222222222222222222222222222.                  ,\2222\`/222222222222222222[.       ..[22222222222222222222222222222'),
('2222222222222222222222222222`                   =222^  2222222222222[`                 ,2222222222222222222222222222'),
('22222222222222222222222222^   ]2\]`              =2`   2222222/`                        2222222222222222222222222222'),
('222222222222222222222222/   /22222222\`      ,]222`    ,2[`                            =2222222222222222222222222222'),
('2222222222222222222222/   /2222222222222222222222^                                 ,22222222222222222222222222222222'),
('222222222222222222222^ ,/2222222[`   ,[[[\2222222.                .]/22^          /222222222222222222222222222222222'),
('2222222222222222222/ ,/222222/`            ,22222          ,]]]222222`           =2222222222222222222222222222222222'),
('222222222222222222`,2222222/[               /222^        ,22222222/`             [2222222222222222222222222222222222'),
('2222222222222222^ /22222222^               /222/          222222[                   ,2222222222222222222222222222222'),
('22222222222222/ ,222222222`                 22^           ,22/                      /2222222222222222222222222222222'),
('2222222222222/./2222222`                  ,22^                                    ]/22222222222222222222222222222222'),
('222222222222./222222[                    /22`    =^                            ,222222222222222222222222222222222222'),
('2222222222/,22222[                      ,,/     .2^                     ,]/22222222222222222222222222222222222222222'),
('22222222/./22[.                          `      22^              .]22222222222222222222/[[[[[\2222222222222222222222'),
('22222/`                                        222^           2222222222222222[[.                [\22222222222222222'),
('22222                    /2                   22222.          =22222222[`                           ,222222222222222'),
('2222^                  /22/                  =22222\           =2[`                                   ,2222222222222'),
('2222\              ,]22222                   2222222`                                                  .222222222222'),
('222222\]]]]`]]\/222222222`                 ,222222222                     ]/222\/2222\2/]`              ,22222222222'),
('222222222222222222222222`                 /[22222222/                     2222222222222222\             /22222222222'),
('22222222222222222222222^               ,`    ,22222^           .         =2/`,`[,[[\222222\             \22222222222'),
('2222222222222222222222^              ,/2^      ,222].     .]]222^        .             ,222             222222222222'),
('222222222222222222222/.             /222^       ,2222222222222222                       =2^            .222222222222'),
('222222222222222222222.              22222.       =222222222222/`                       =22.            ,222222222222'),
('22222222222222222222^             .222222.       .22222222/[                         ,`=22             =222222222222'),
('2222222222222222222^..  .. .      .\22222^        ,2222[.                           ]2222^.           .2222222222222'),
('22222222222222222/.....          ..2222222`. .   . ,^          .            .   ]\/222222.            =2222222222222'),
('22222222222222222........... . ....22222222... . .. ,` .. .    ... .. .    .. .222222222^       ...  .22222222222222'),
('2222222222222222`................ .\2222222^ ... ............,/22^...... ../222222222222.............=22222222222222'),
('22222222222222022`..................,\2222222`.........,\]]22222`........../222222222222^.............222222222222222'),
('2222222222222222^...................222222222..........=2222/.,`.....................\2.............=222222222222222'),
('222222222222222222.]22222..........,2222222222..........2/]222......................................2222222222222222'),
('2222222222222222222222222..........,2222222222^..........`.........................................,2222222222222222'),
('2222222222222222222222222^.........=22222222222^...................................................22222222222222222'),
('2222222222222222222222222^.........=222222222222\.................................................222222222222222222'),
('2222222222222222222222222^.*........=222222222222\...................,]]]/22222\...............,22222222222222222222'),
('2222222222222222222222222^...*....../2222222222222\.......]/22222222222222222222\...*..*.....,2222222222222222222222'),
('22222222222222222222222222......**]/2222222222222222\**,]2222222222222222222222222`.......,2222222222222222222222222'),
('22222222222222222222222222\/2\22222222222222222222222222222222222222222222222222222222222222222222222222222222222222'),
('22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222'),
('22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222'),
('22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222'),
('22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222');

design sketch:

select * from LuciferFu;

Interested friends can try it by themselves and play the year of the tiger!

Topics: crawler Network Protocol http