Problems with Scrapy crawling image custom image file names

Problem: only one picture is downloaded. The name of the picture is the name of the last picture, but the content is not the content of the last picture. When printing, the relevant information of the last picture appears multiple times.  

Relevant information of corresponding documents:

Items file related content:

Spider file content:

Settings file content:

Relevant contents of pipelies file:

At first, I wrote this. I took the last paragraph of the image path as the file name. There was no big problem

But I don't think it looks good. I plan to use their original names as file names

The final writing method is not to pass in the entire item, but the corresponding field content in the item object

Items file related content:

# Define here the models for your scraped items
# See documentation in:

import scrapy

class FabiaoqingItem(scrapy.Item):
    # define the fields for your item here like:
    images = scrapy.Field()
    image_urls = scrapy.Field()
    # pass

Spider file content:

import scrapy
from first.items import FabiaoqingItem

class FabiaoqingSpider(scrapy.Spider):
    name = 'fabiaoqing'
    # allowed_domains = ['']
    start_urls = ['']

    def parse(self, response):
        # Get item object
        item = FabiaoqingItem()
        # Get all expression pack information
        imgs = response.css('.tagbqppdiv')
        for img in imgs:
            # title
            title = img.css('a::attr(title)').extract()
            # route
            src = img.css('a img::attr(data-original)').extract()
            # Extract the contents of the list into a string
            title = ''.join(title)
            src = ''.join(src)
            # print(title, src)
            # Encapsulate the obtained content into the item object
            item['images'] = title
            item['image_urls'] = src
            # Submit item to pipeline
            yield item

# Highlight: during the implementation process, no error is reported and no result is given. Find a warning:
# WARNING: Disabled FabiaoqingPipeline: ImagesPipeline requires installing Pillow 4.0.0 or later
# This means that the version of pilot 4.0.0 or above is not installed in the system. You need to install it manually: PIP install pilot

Settings file content:

# Scrapy settings for first project
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:

# Console log output level
import random

# Directory where picture files are saved
IMAGES_STORE = './images'
# Limit log output level

BOT_NAME = 'first'

SPIDER_MODULES = ['first.spiders']
NEWSPIDER_MODULE = 'first.spiders'

# Crawl responsibly by identifying yourself (and your website) on the user-agent
# USER_AGENT = 'first (+'
# User agent list

    # Edge
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62',
    # Firefox
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0',
    # Chrome
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'
# Random user agent

# Obey robots.txt rules
# Compliance with robots protocol

# Configure maximum concurrent requests performed by Scrapy (default: 16)

# Configure a delay for requests for the same website (default: 0)
# See
# See also autothrottle settings and docs
# The download delay setting will honor only one of:

# Disable cookies (enabled by default)

# Disable Telnet Console (enabled by default)

# Override the default request headers:
#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#   'Accept-Language': 'en',
# }

# Enable or disable spider middlewares
# See
#    'first.middlewares.FirstSpiderMiddleware': 543,
# }

# Enable or disable downloader middlewares
# See
#    'first.middlewares.FirstDownloaderMiddleware': 543,
# }

# Enable or disable extensions
# See
#    'scrapy.extensions.telnet.TelnetConsole': None,
# }

# Configure item pipelines
# See
# Project pipeline
    # The lower the value, the higher the priority
    # 'first.pipelines.FirstPipeline': 300,
    # 'first.pipelines.ToMysqlPipeline': 301,
    'first.pipelines.FabiaoqingPipeline': 302,

# Enable and configure the AutoThrottle extension (disabled by default)
# See
# The initial download delay
# The maximum download delay to be set in case of high latencies
# The average number of requests Scrapy should be sending in parallel to
# each remote server
# Enable showing throttling stats for every response received:

# Enable and configure HTTP caching (disabled by default)
# See
# HTTPCACHE_DIR = 'httpcache'
# HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

Relevant contents of pipelies file:

At first, I wrote this. I took the last paragraph of the image path as the file name. There was no big problem

import scrapy
from scrapy.pipelines.images import ImagesPipeline

class FabiaoqingPipeline(ImagesPipeline):
    """Download and publish the expression package pipeline class of love network

    def get_media_requests(self, item, info):
        """Request for each emoticon packet
        return [scrapy.Request(url=item['image_urls'])]

    def file_path(self, request, response=None, info=None, *, item=None):
        """Path of picture saving (sub path)

        # Take the last segment of the download path as the file name
        img_name = request.url.split('/')[-1]
        return img_name

    def item_completed(self, results, item, info):
        """take item Pass to next pipe class
        return item

Print results:

Open the folder and have a look

45 in total, no problem

But I don't think it looks good. I plan to use their original names as file names

Let's delete the previous pictures first

return [scrapy.Request(url=item['image_urls'], meta={'item': item})]
import re
import scrapy
from scrapy.pipelines.images import ImagesPipeline

class FabiaoqingPipeline(ImagesPipeline):
    """Download and publish the expression package pipeline class of love network

    def get_media_requests(self, item, info):
        """Request for each emoticon packet
        # return [scrapy.Request(url=item['image_urls'])]
        # Pass in meta to prepare for subsequent customization file names
        return [scrapy.Request(url=item['image_urls'], meta={'item': item})]

    def file_path(self, request, response=None, info=None, *, item=None):
        """Path of picture saving (sub path)

        # Take the last segment of the download path as the file name
        # img_name = request.url.split('/')[-1]
        # print(img_name)
        # return img_name

        # Custom file name
        item = request.meta['item']     # Get item object from meta
        img_name = item['images']
        # The names of some expression packs contain special characters, so they cannot be saved and need to be changed
        img_name = re.sub(r'[\/:*?"<>|\n]', '', img_name)
        # Because some are gif and some are jpg, the file suffix cannot be written dead
        img_type = request.url.split('.')[-1]
        img_path = img_name + '.' + img_type
        return img_path

    def item_completed(self, results, item, info):
        """take item Pass to next pipe class
        return item

Print results:
I'll fucking kill you.jpg
 Your father is here.gif
 Call me baby, you won't be alone in the new year.jpg
2022 Must be rich(Panda head expression pack).gif
 It's 2022.gif
 Call me wife for the new year, and you won't be alone.jpg
 Your father is here. Your father is here.gif
 See you next time when you have insomnia.jpg
 Although I roll, I'm still a vegetable.jpg
 Patrick Star Q elasticのAss, run.gif
 You want to break my defense.jpg
 Knock on your head.gif
 Why not? No. 1 in Internet surfing.jpg
 Very yellow vegetable dog.jpg
 Finger line!(Specify row).jpg
 Silent egg(Sweat expression bag).jpg
 I really want to fit in with you, but I have to work overtime.gif
 Panda head expression pack of the last hero League mobile game.gif
 congratulations!You've been visited by a trapped cat!From now on, no matter what you're doing, you'll be sleepy!.jpg
 Cheers to our speechless.gif
 You're so cute, I must knock, so I'm cute.jpg
 Baby mouth a happy s la(Kulomi expression pack).jpg
 Wife, open the door. It's me. I'm back. Open the door? Didn't drink? Guilty Xiugou knocks on the door.gif
 Dripping sweat little yellow face Christmas clothes Christmas hat expression bag.gif
2022 Must be able to love panda head expression bag.jpg
 Your eyes are full of you.jpg
 Fart attack.gif
 Young master and young lady get up.gif
3d Refer to human expression pack warning once.gif
 Tickle your ass dynamic picture expression pack.gif
 Da baa color.jpg
 The person pointed out did three anal lifting exercises, right now.jpg
 Cat probe observation GIF Animation expression pack.gif
 Look, I'm angry again.jpg
 Wordless sweat expression pack.gif
Ji Bad guy bad guy big bad guy.gif
 No one likes me.jpg
 The leaping legs jumped up GIF Dynamic graph.gif
1)60 There's something in that(Wechat chat box expression pack).gif
 Berry thing(Fruit homophonic expression pack).jpg
 Poggy crying emoticon pack.gif
 Hug charging.gif
 Sad frog gives you a punch.gif
 From time to time, the gang came to visit white prostitutes(Group chat expression pack).jpg
 Mom! Big mouth cat expression bag.jpg
It can be seen that a very strange phenomenon. The first 44 pictures are normal, but there are a lot of repetitions in 45 pictures. More importantly, only 2 pictures have been downloaded. Let's have a look

The file name is the name of the last picture, and the file name does not correspond to the content

The final writing method is not to pass in the entire item, but the corresponding field content in the item object

return [scrapy.Request(url=item['image_urls'], meta={'images': item['images']})]
import re
import scrapy
from scrapy.pipelines.images import ImagesPipeline

class FabiaoqingPipeline(ImagesPipeline):
    """Download and publish the expression package pipeline class of love network

    def get_media_requests(self, item, info):
        """Request for each emoticon packet
        # return [scrapy.Request(url=item['image_urls'])]
        # Pass in meta to prepare for subsequent customization file names
        # return [scrapy.Request(url=item['image_urls'], meta={'item': item})]
        return [scrapy.Request(url=item['image_urls'], meta={'images': item['images']})]

    def file_path(self, request, response=None, info=None, *, item=None):
        """Path of picture saving (sub path)

        # Take the last segment of the download path as the file name
        # img_name = request.url.split('/')[-1]
        # print(img_name)
        # return img_name

        # # Custom file name
        # item = request.meta['item']     # Get item object from meta
        # img_name = item['images']
        # # The names of some expression packs contain special characters, so they cannot be saved and need to be changed
        # img_name = re.sub(r'[\/:*?"<>|\n]', '', img_name)
        # # Because some are gif and some are jpg, the file suffix cannot be written dead
        # img_type = request.url.split('.')[-1]
        # img_path = img_name + '.' + img_type
        # print(img_path)
        # return img_path

        # That's it. Strange
        img_name = request.meta['images']
        # The names of some expression packs contain special characters, so they cannot be saved and need to be changed
        img_name = re.sub(r'[\/:*?"<>|\n]', '', img_name)
        # Because some are gif and some are jpg, the file suffix cannot be written dead
        img_type = request.url.split('.')[-1]
        img_path = img_name + '.' + img_type
        return img_path

    def item_completed(self, results, item, info):
        """take item Pass to next pipe class
        return item

Print results:
I'll fucking kill you.jpg
 Your father is here.gif
 Call me baby, you won't be alone in the new year.jpg
2022 Must be rich(Panda head expression pack).gif
 It's 2022.gif
 Call me wife for the new year, and you won't be alone.jpg
 Your father is here. Your father is here.gif
 See you next time when you have insomnia.jpg
 Although I roll, I'm still a vegetable.jpg
 Patrick Star Q elasticのAss, run.gif
 You want to break my defense.jpg
 Knock on your head.gif
 Why not? No. 1 in Internet surfing.jpg
 Very yellow vegetable dog.jpg
 Finger line!(Specify row).jpg
 Silent egg(Sweat expression bag).jpg
 I really want to fit in with you, but I have to work overtime.gif
 Panda head expression pack of the last hero League mobile game.gif
 congratulations!You've been visited by a trapped cat!From now on, no matter what you're doing, you'll be sleepy!.jpg
 Cheers to our speechless.gif
 You're so cute, I must knock, so I'm cute.jpg
 Baby mouth a happy s la(Kulomi expression pack).jpg
 Wife, open the door. It's me. I'm back. Open the door? Didn't drink? Guilty Xiugou knocks on the door.gif
 Dripping sweat little yellow face Christmas clothes Christmas hat expression bag.gif
2022 Must be able to love panda head expression bag.jpg
 Your eyes are full of you.jpg
 Fart attack.gif
 Young master and young lady get up.gif
3d Refer to human expression pack warning once.gif
 Tickle your ass dynamic picture expression pack.gif
 Da baa color.jpg
 The person pointed out did three anal lifting exercises, right now.jpg
 Cat probe observation GIF Animation expression pack.gif
 Look, I'm angry again.jpg
 Wordless sweat expression pack.gif
Ji Bad guy bad guy big bad guy.gif
 No one likes me.jpg
 The leaping legs jumped up GIF Dynamic graph.gif
1)60 There's something in that(Wechat chat box expression pack).gif
 Berry thing(Fruit homophonic expression pack).jpg
 Poggy crying emoticon pack.gif
 Hug charging.gif
 Sad frog gives you a punch.gif
 From time to time, the gang came to visit white prostitutes(Group chat expression pack).jpg
 Mom! Big mouth cat expression bag.jpg
The result of this printing is even more strange. Basically, there is a repetition in each

But! What's more strange is that the result is no problem. Let's have a look

There are 45 items in total. There is no problem in the quantity. The file name is also user-defined. It is their original name and there is no problem in the content. It is one-to-one correspondence, which is very inexplicable!

Obviously, it is a problem from top to bottom, but the result is no problem.

In a sense, the problem has been solved, but it has not been completely solved.

I hope Daniel can solve my doubts. Thank you very much!

