1, Narration (beginning of story)
The other day, my sister asked me to go out secretly and asked me to see a movie emmm.... It's nothing. Let's go. Hey, hey ~
Go??
I regretted going there. I found a park and sat watching movies on my mobile phone
Feeding mosquitoes, sweating, watching boring movies, the key is that she doesn't let me idle, let me fan her to drive mosquitoes (I came out to play lonely???)
After reading it, I was also invited to have a six yuan bowl of spicy hot, and then this was the first step in selling myself
After dinner, we talked:
Elder sister: (looking at me with an obscene face for a long time) younger brother, is it hard
Me: ah???? Hard disk???
Sister: is the stool hard?
Me: Angang is OK. What else can I do for you?? I knew it with your obscene smile
Sister: Oh, how can I talk? I invite you to see a movie. How nice I am, right? As for me, there are a few small things
Sister: Recently, my teacher arranged a small task for me to analyze the box office ranking in recent years. You can crawl again. Give me some data to climb down
Me: No, you just want me to do such a big project for you. I won't
Sister: OK, then I'll ask the Sports Committee of your class to help me get it. I asked me out to see a movie yesterday
Me: shut up, let go of the sports committee and let me come!
2, This is where the bitter force begins
So I started a 20 minute selling session, and the painful selling route began here. Looking for data and thinking about the box office, it should be cat's eye movies
So I found this website: https://piaofang.maoyan.com/mdb/rank
The general page is as follows:
-Page analysis
I analyzed that this web page is dynamically loaded data, and I found the data of each year in a snap:
-Detail page URL analysis
https://piaofang.maoyan.com/mdb/rank/query?type=0&id=0 This is a separate page of the box office
But I didn't climb this, and it's not very useful
https://piaofang.maoyan.com/mdb/rank/query?type=0&id=2021
https://piaofang.maoyan.com/mdb/rank/query?type=0&id=2020
... ...
https://piaofang.maoyan.com/mdb/rank/query?type=0&id=n
-Modules used
requests,csv,pandas,matplotlib
-Key content
Dynamic loading, data capture, storage in csv file, data analysis...
Why do you use data analysis? Go on
The code is as follows:
import requests import time import random import csv class PiaofangSpider: def __init__(self): self.url = 'https://piaofang.maoyan.com/mdb/rank/query?type=0&id={}' self.f = open('piaofang.csv', 'w', encoding='utf8', newline='') self.writer = csv.writer(self.f) # Write header row data_list = ('Movie title', 'Release time', 'box office', 'Average fare', 'Average number of people on site') self.writer.writerow(data_list) def get_html(self, url): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36', } html = requests.get(url=url, headers=headers).json() self.parse_html(html) def parse_html(self, html): # Extract dynamically loaded data result = html['data']['list'] for res in result: item = {} item['movieName'] = res['movieName'] item['releaseInfo'] = res['releaseInfo'] item['boxDesc'] = res['boxDesc'] item['avgViewBoxDesc'] = res['avgViewBoxDesc'] item['avgShowViewDesc'] = res['avgShowViewDesc'] print(item) self.writer.writerow(item.values()) def run(self): for i in range(2011, 2022): url_html = self.url.format(i) self.get_html(url=url_html) time.sleep(random.randint(1, 2)) self.f.close() if __name__ == '__main__': spider = PiaofangSpider() spider.run()
I also looked at it. This page is very magical. You can also grab the content in html normally
Use xpath and regular to extract data. I found it after climbing... I also analyzed the bag grabbing for a long time
Finally, she said that she asked me to analyze the data for her, otherwise she would spit out the spicy hot, and she would never come out with me again next time
This is a bit of bullying. I'm a seven foot man. How can I bow my head for this little request? I'm sure I won't
Data analysis chart...
import pandas as pd import numpy as np from matplotlib import pyplot as plt plt.rcParams['font.sans-serif'] = ['SimHei'] plt.rcParams['axes.unicode_minus'] = False %matplotlib inline # read file data=pd.read_csv('piaofang.csv') data.head(10) # Take out the year column year data['year'] = data['Release time'].apply(lambda x: int(x[0:4])) data
# Top ten films at the box office data.sort_values(by=['box office'],ascending=False).head(10).plot.bar(x='Movie title',y='box office',title='Top 10 at the box office')
# Number of films released each year fig=plt.figure(dpi=120) groupby_year = data.groupby('year').size() groupby_year.plot(title = 'Number of films released each year') plt.show()
# Total annual box office fig=plt.figure(dpi=120) sum_money = data.groupby('year')['box office'].sum() sum_money.plot.bar(title = 'Total annual box office') plt.show()
3, I went straight to the spot and exploded
The last heavy blow:
The old woman let me take the order behind my back. I'm so at a loss. I've been helping her make money from beginning to end. I can't get anything by myself. It's a huge loss~
Maybe I've been a worker all my life~