I reluctantly spent more than 20 minutes to meet the requirements of my sister, but

Posted by Mchl on Sat, 29 Jan 2022 02:34:11 +0100

1, Narration (beginning of story)

The other day, my sister asked me to go out secretly and asked me to see a movie emmm.... It's nothing. Let's go. Hey, hey ~

Go??

I regretted going there. I found a park and sat watching movies on my mobile phone
Feeding mosquitoes, sweating, watching boring movies, the key is that she doesn't let me idle, let me fan her to drive mosquitoes (I came out to play lonely???)
After reading it, I was also invited to have a six yuan bowl of spicy hot, and then this was the first step in selling myself

After dinner, we talked:

Elder sister: (looking at me with an obscene face for a long time) younger brother, is it hard
Me: ah???? Hard disk???

Sister: is the stool hard?
Me: Angang is OK. What else can I do for you?? I knew it with your obscene smile

Sister: Oh, how can I talk? I invite you to see a movie. How nice I am, right? As for me, there are a few small things
Sister: Recently, my teacher arranged a small task for me to analyze the box office ranking in recent years. You can crawl again. Give me some data to climb down

Me: No, you just want me to do such a big project for you. I won't
Sister: OK, then I'll ask the Sports Committee of your class to help me get it. I asked me out to see a movie yesterday
Me: shut up, let go of the sports committee and let me come!

2, This is where the bitter force begins

So I started a 20 minute selling session, and the painful selling route began here. Looking for data and thinking about the box office, it should be cat's eye movies

So I found this website: https://piaofang.maoyan.com/mdb/rank
The general page is as follows:

-Page analysis

I analyzed that this web page is dynamically loaded data, and I found the data of each year in a snap:

-Detail page URL analysis

https://piaofang.maoyan.com/mdb/rank/query?type=0&id=0 This is a separate page of the box office
But I didn't climb this, and it's not very useful

https://piaofang.maoyan.com/mdb/rank/query?type=0&id=2021
https://piaofang.maoyan.com/mdb/rank/query?type=0&id=2020
... ...
https://piaofang.maoyan.com/mdb/rank/query?type=0&id=n

-Modules used

    requests,csv,pandas,matplotlib

-Key content

Dynamic loading, data capture, storage in csv file, data analysis...
Why do you use data analysis? Go on

The code is as follows:

import requests
import time
import random
import csv


class PiaofangSpider:
    def __init__(self):
        self.url = 'https://piaofang.maoyan.com/mdb/rank/query?type=0&id={}'
        self.f = open('piaofang.csv', 'w', encoding='utf8', newline='')
        self.writer = csv.writer(self.f)
        # Write header row
        data_list = ('Movie title', 'Release time', 'box office', 'Average fare', 'Average number of people on site')
        self.writer.writerow(data_list)

    def get_html(self, url):
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36',
        }
        html = requests.get(url=url, headers=headers).json()
        self.parse_html(html)

    def parse_html(self, html):
		# Extract dynamically loaded data
        result = html['data']['list']

        for res in result:
            item = {}
            item['movieName'] = res['movieName']
            item['releaseInfo'] = res['releaseInfo']
            item['boxDesc'] = res['boxDesc']
            item['avgViewBoxDesc'] = res['avgViewBoxDesc']
            item['avgShowViewDesc'] = res['avgShowViewDesc']
            print(item)

            self.writer.writerow(item.values())

    def run(self):
        for i in range(2011, 2022):
            url_html = self.url.format(i)
            self.get_html(url=url_html)
            time.sleep(random.randint(1, 2))
        self.f.close()


if __name__ == '__main__':
    spider = PiaofangSpider()
    spider.run()

I also looked at it. This page is very magical. You can also grab the content in html normally

Use xpath and regular to extract data. I found it after climbing... I also analyzed the bag grabbing for a long time

Finally, she said that she asked me to analyze the data for her, otherwise she would spit out the spicy hot, and she would never come out with me again next time
This is a bit of bullying. I'm a seven foot man. How can I bow my head for this little request? I'm sure I won't

Data analysis chart...

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
%matplotlib inline
# read file
data=pd.read_csv('piaofang.csv')
data.head(10)

# Take out the year column year
data['year'] = data['Release time'].apply(lambda x: int(x[0:4])) 
data
# Top ten films at the box office
data.sort_values(by=['box office'],ascending=False).head(10).plot.bar(x='Movie title',y='box office',title='Top 10 at the box office')

# Number of films released each year
fig=plt.figure(dpi=120)
groupby_year = data.groupby('year').size()
groupby_year.plot(title = 'Number of films released each year')
plt.show()

# Total annual box office
fig=plt.figure(dpi=120)
sum_money = data.groupby('year')['box office'].sum()
sum_money.plot.bar(title = 'Total annual box office')
plt.show()

3, I went straight to the spot and exploded

The last heavy blow:

The old woman let me take the order behind my back. I'm so at a loss. I've been helping her make money from beginning to end. I can't get anything by myself. It's a huge loss~
Maybe I've been a worker all my life~

Topics: Python Big Data Data Analysis matplotlib pandas