Analysis of the top 100 crawlers of China's third-order cube speed blindness

Posted by poscribes on Fri, 31 Dec 2021 12:00:07 +0100

Analysis of the top 100 crawlers of China's third-order cube speed blindness

1, Topic background

Rubik's cube, also known as Rubik's cube, was first invented by Professor El Rubik of Budapest Institute of architecture in Hungary in 1974. WCA(World Cube Association) holds various Rubik's cube competitions all over the world every year. In November 1991, China's first domestic Rubik's cube competition was held in Guangzhou. In January 2004, Chinese players participated in the world Rubik's Cube Association competition for the first time. In May 2009, Chinese players broke the world record in the official competition for the first time. In June 2014, cuping China was officially launched, which played a key role in the development of domestic Rubik's cube cause, Now the Chinese have also made excellent achievements in the third-order magic cube. This course will make crawler analysis on the data of the top 100 of the third-order magic cube speed screw and blind screw of the rough cake web.

 

2, Thematic web crawler design

1. Topic crawler name

Rough cake · China Rubik's cube tournament website China third-order Rubik's cube single speed blind top 100 crawler analysis

2. Content and data feature analysis of topic web crawler

The ranking, name, region, score, competition, date and year data of the top 100 single speed screwing and blind screwing of China's third-order magic cube are analyzed.

Data source: https://cubing.com/ (data as of December 27, 2021)

3. Overview of thematic web crawler design scheme

(1) implementation ideas

Through the analysis of the rough cake web page, we use the Requests crawler library and beautiful soup library to climb to the top 100 ranking, name, region, score, competition, date and year of China's third-order magic cube in a single quick twist and blind twist, and then save it as csv file, and finally visual analysis.

(2) technical difficulties

For some difficulties in crawling data, difficulties are encountered in data visual analysis using Matplotlib and pyechards.

3, Analysis on the structural characteristics of theme pages

1. Structure and feature analysis of theme page

Screenshot of relevant pages to be crawled

Top 100 of level 3 single score (screenshot only shows part)

Top 100 scores of third-order single blind screwing (screenshot only shows part)

 

 

 

The data we need to crawl is shown in the figure.

2. HTML page parsing

 

Through the observation and analysis of page review elements, we can know that the data we need to crawl is in the div tag under the body tag (the same is true for the single data of third-order blind screwing)

3. Node (label) search method and traversal method

Find and traverse nodes (labels) through the for loop.

4, Web crawler programming

1. Data crawling and acquisition

 

 1 #Import the library we need#
 2 import requests
 3 from bs4 import BeautifulSoup
 4 import csv
 5 main_url = 'https://cubingchina.com'
 6 def get_html(url):
 7 #according to url Return the corresponding html object#
 8 #Write request header#
 9     headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}
10     response = requests.get(url, headers=headers)
11     response.encoding='utf-8-sig'
12     html = response.text
13     return html
14 def get_soup(html):
15 #according to HTML return beautifulsoup object#
16     soup = BeautifulSoup(html, 'lxml')
17     return soup
18 def get_rank_list(soup):
19 #use beautifulsoup Analyze the web page, obtain the player information, and return the player information list#
20     rank_text_list = soup.find('table', {'class': 'table table-bordered table-condensed table-hover table-boxed'})
21     rank_text_list = rank_text_list.find_all('tr')[1:]
22     rank_list = []
23     for rank_text in rank_text_list:
24         rank_text = rank_text.find_all('td')
25         rank = rank_text[1].extract().text
26         name = rank_text[2].extract().text
27         region = rank_text[3].extract().text
28         result = rank_text[4].extract().text
29         competitive = rank_text[5].extract().text
30         date = rank_text[6].extract().text
31         item = (rank, name, region, result, competitive, date)
32         rank_list.append(item)
33     return rank_list
34 def writer_file(rank_list, flag=''):
35 #Write the data of the third-order quick tightening csv file#
36     header = ('ranking', 'full name','region', 'achievement', 'match', 'date')
37     with open('333_single_china.csv', 'a',encoding='utf-8-sig',newline='') as file:
38         writer = csv.writer(file)
39         if flag == 'first':
40             writer.writerow(header)
41         for rank in rank_list:
42             writer.writerow(rank)
43 def main():
44     start_url = 'https://cubing.com/results/rankings'
45     html = get_html(start_url)
46     soup = get_soup(html)
47     # Write third-order single ranking to file#
48     rank_list = get_rank_list(soup)
49     writer_file(rank_list, 'first')
50     print('The third-order single speed screw ranking data has been successfully crawled and saved!')
51 if __name__=="__main__":
52     main()
53 def writer_file(rank_list, flag=''):
54 #Write the data of the third-order blind screw csv file#
55     header = ('ranking', 'full name','region', 'achievement', 'match', 'date')
56     with open('333bf_single_china.csv', 'a',encoding='utf-8-sig',newline='') as file:
57         writer = csv.writer(file)
58         if flag == 'first':
59             writer.writerow(header)
60         for rank in rank_list:
61             writer.writerow(rank)
62 def main():
63     start_url = 'https://cubing.com/results/rankings?region=China&event=333bf&gender=all&type=single'
64     html = get_html(start_url)
65     soup = get_soup(html)
66     # Third order blind write file#
67     rank_list = get_rank_list(soup)
68     writer_file(rank_list, 'first')
69     print('Third order blind twist single ranking data crawling succeeded, saved!')
70 if __name__=="__main__":
71     main()    

 

Operation results:

 

 

The saved csv files are as follows

2. Clean and process the data
1 #Import the data of the top 100 ranking of China in a single time#
2 import pandas as pd
3 df=pd.DataFrame(pd.read_csv('C:\\Users\\Lenovo\\Desktop\\Chinese third-order cube data\\333_single_china.csv'))
4 df

Operation results:

1 #Import the data of the top 100 ranking in China for a single time#
2 import pandas as pd
3 df2=pd.DataFrame(pd.read_csv('C:\\Users\\Lenovo\\Desktop\\Chinese third-order cube data\\333bf_single_china.csv'))
4 df2

Operation results:

 

 

1 #Delete the rows and columns we don't need in the third-order quick twist single ranking data#
2 df.drop('match',axis=1,inplace=True)
3 df
4 df.drop('region',axis=1,inplace=True)
5 df

Operation results:

1 #Delete the rows and columns we don't need in the third-order single blind ranking#
2 df2.drop('match',axis=1,inplace=True)
3 df2
4 df2.drop('region',axis=1,inplace=True)
5 df2

Operation results:

1 # Check whether there is a null value in the data, and return if there is a null value True,No null value returned False#
2 df.isnull().value_counts()

Operation results:

1 df2.isnull().value_counts()

Operation results:

1 # Check whether there are duplicate rows in the crawled data, and return if there are duplicate rows True,No duplicate rows returned False#
2 df.duplicated()

Operation results:

1 df2.duplicated()

Operation results:

 

 

 3. Data analysis and visualization

 1 #utilize Matplotlib Draw the top 100 ranking histogram of China's third-order magic cube by December 27, 2021#
 2 #Import library#
 3 import pandas as pd
 4 import matplotlib
 5 import matplotlib.pyplot as plt
 6 plt.figure(figsize=(15, 7),dpi=600)
 7 # Solve Chinese and minus sign display problems#
 8 plt.rcParams['font.sans-serif']=['SimHei']
 9 plt.rcParams['axes.unicode_minus'] = False
10 #Read the first 100 data of Chinese third-order Rubik's cube in a single quick twist csv file#
11 df = pd.read_csv("C:\\Users\\Lenovo\\Desktop\\Chinese third-order cube data\\333_single_china.csv")
12 #set up x Axis and y Axis display content#
13 plt.bar(df['full name'], df['achievement'])
14 #Set title#
15 plt.title(r'By December 27, 2021, China's top 100 third-order Rubik's Cube(Company:s)')
16 #set up x Axis label#
17 plt.xlabel('Name of contestant')
18 #set up y Axis label#
19 plt.ylabel(r'Player performance')
20 #take x Axis label font adjusted to 7#
21 plt.tick_params(labelsize=7)
22 #take x Axis label font rotates 90 degrees clockwise#
23 plt.xticks(rotation=90)
24 #Data persistence#
25 plt.savefig('By December 27, 2021, China's top 100 third-order Rubik's Cube.jpg')
26 #Display visualization results#
27 plt.show()

 

Operation results:

 1 #utilize pyecharts Draw the top 100 ranking histogram of China's third-order magic cube blind twist by December 27, 2021#
 2 #Import library#
 3 import time
 4 import json
 5 import pandas as pd
 6 import numpy as np
 7 from pyecharts import options as opts
 8 data=df2
 9 bar = (
10     Bar()
11     .add_xaxis(df2["full name"].tolist())
12     .add_yaxis("achievement", df2["achievement"].tolist())
13     .set_global_opts(
14         #Setting of drawing title#
15         title_opts=opts.TitleOpts(
16             title="By December 27, 2021, China's top 100 third-order magic cube blind twist",
17             subtitle="(Only part is shown in the figure)",
18             pos_left="center",
19             pos_top="7%"),
20          #'shadow': Shadow indicator#
21         tooltip_opts=opts.TooltipOpts(
22             is_show=True, 
23             trigger="axis", 
24             axis_pointer_type="shadow"
25             ),
26         #x Axis coordinate configuration item,x Shaft label, 90 degrees clockwise#
27         xaxis_opts=opts.AxisOpts(name="full name",axislabel_opts={"interval":"0","rotate":90}),
28         #y Axis configuration item#
29         yaxis_opts=opts.AxisOpts(
30             name="achievement",min_=0,
31             type_="value",axislabel_opts=opts.LabelOpts(formatter="{value} second"),),
32         #Area scaling configuration item#
33         datazoom_opts=opts.DataZoomOpts(range_start=0,range_end=10),
34     )
35 )
36 bar.render_notebook()

 

Operation results:

 1 #utilize Matplotlib Draw the scatter chart of the creation time of the top 100 of China's third-order magic cube by December 27, 2021#
 2 import pandas as pd
 3 import matplotlib.pyplot as plt
 4 plt.figure(figsize=(15, 6.5),dpi=600)
 5 plt.title('As of December 27, 2021, China's top 100 creation time scatter chart of third-order magic cube',size=25,color="black")
 6 #Draw a scatter diagram#
 7 plt.scatter(df['date'], df['Year of creation'])
 8 plt.xticks(rotation=90)
 9 #Data persistence#
10 plt.savefig('As of December 27, 2021, China's top 100 creation time scatter chart of third-order magic cube.jpg')
11 #Display the scatter chart
12 plt.show()

Operation results:

 1 #utilize Matplotlib Draw the scatter chart of the creation time of the first 100 of China's third-order magic cube blind twist by December 27, 2021#
 2 import pandas as pd
 3 import matplotlib.pyplot as plt
 4 plt.figure(figsize=(15, 6.5),dpi=600)
 5 plt.title('As of December 27, 2021, the first 100 creation time scatter of China's third-order magic cube blind twist',size=25,color="black")
 6 #Draw a scatter diagram#
 7 plt.scatter(df2['date'], df2['Year of creation'])
 8 plt.xticks(rotation=90)
 9 #Data persistence#
10 plt.savefig('As of December 27, 2021, the first 100 creation time scatter of China's third-order magic cube blind twist.jpg')
11 #Display the scatter chart
12 plt.show()

Operation results:

 1 #utilize Matplotlib Draw the time distribution map of the top 100 achievements of China's third-order magic cube by December 27, 2021#
 2 #Import library#
 3 import pandas as pd
 4 import matplotlib
 5 import matplotlib.pyplot as plt
 6 #Set drawing size#
 7 plt.figure(figsize=(15,7),dpi=80)
 8 #Set the display scale of each part#
 9 finance=[len(df[df["Year of creation"]==2015]["Year of creation"]),
10          len(df[df["Year of creation"]==2017]["Year of creation"]),
11          len(df[df["Year of creation"]==2018]["Year of creation"]),
12          len(df[df["Year of creation"]==2019]["Year of creation"]),
13          len(df[df["Year of creation"]==2020]["Year of creation"]),
14          len(df[df["Year of creation"]==2021]["Year of creation"])]
15 
16 #Set up Chinese display to solve the problem of garbled code#
17 font = {'family' : 'MicroSoft YaHei',
18         'weight': 'bold',
19         'size': '12'}
20 matplotlib.rc("font",**font)
21 matplotlib.rc("font",
22               family='MicroSoft YaHei',
23               weight="bold")
24 labels = ["2015",
25           "2017",
26           "2018",
27           "2019",
28           "2020",
29           "2021"]
30 explode = [0.1, 0.1, 0.1, 0.1,0.1,0.1]
31 colors = ['mistyrose', 'salmon', 'tomato', 'darksalmon','mistyrose','orangered']
32 #Set pie chart related parameters#
33 plt.pie(finance, explode=explode, labels=labels,
34         colors=colors,autopct='%.2f%%',
35         pctdistance=0.8, labeldistance=1.1,
36         startangle=180,  radius=1.2,
37         counterclock=False,
38         wedgeprops={'linewidth':1.5, 'edgecolor':'white'},
39         textprops={'fontsize':10, 'color':'black'} )
40 #Set title#
41 plt.title('Pie chart of year distribution of top 100 achievements in China's third-order magic cube')
42 #Data persistence#
43 plt.savefig('Pie chart of year distribution of top 100 achievements in China's third-order magic cube.jpg')
44 #Display visual image#
45 plt.show()

 

Operation results:

 

 

 

 

 

 

 1 #utilize Matplotlib Draw the time distribution map of the top 100 achievements of China's third-order magic cube blind twist by December 27, 2021#
 2 #Import library#
 3 import pandas as pd
 4 import matplotlib
 5 import matplotlib.pyplot as plt
 6 #Set drawing size#
 7 plt.figure(figsize=(15,7),dpi=80)
 8 #Set the display scale of each part#
 9 finance=[len(df2[df2["Year of creation"]==2009]["Year of creation"]),
10          len(df2[df2["Year of creation"]==2010]["Year of creation"]),
11          len(df2[df2["Year of creation"]==2012]["Year of creation"]),
12          len(df2[df2["Year of creation"]==2013]["Year of creation"]),
13          len(df2[df2["Year of creation"]==2014]["Year of creation"]),
14          len(df2[df2["Year of creation"]==2015]["Year of creation"]),
15          len(df2[df2["Year of creation"]==2016]["Year of creation"]),
16          len(df2[df2["Year of creation"]==2017]["Year of creation"]),
17          len(df2[df2["Year of creation"]==2018]["Year of creation"]),
18          len(df2[df2["Year of creation"]==2019]["Year of creation"]),
19          len(df2[df2["Year of creation"]==2021]["Year of creation"])]
20 
21 #Set up Chinese display to solve the problem of garbled code#
22 font = {'family' : 'MicroSoft YaHei',
23         'weight': 'bold',
24         'size': '12'}
25 matplotlib.rc("font",**font)
26 matplotlib.rc("font",
27               family='MicroSoft YaHei',
28               weight="bold")
29 #Set pie chart display#
30 labels = ["2009",
31           "2010",
32           "2012",
33           "2013",
34           "2014",
35           "2015",
36           "2016",
37           "2017",
38           "2018",
39           "2019",
40           "2021"]
41 explode = [0.1, 0.1, 0.1, 0.1,0.1,0.1,0.1, 0.1, 0.1, 0.1,0.1]
42 colors = ['plum', 'violet', 'purple', 'darkmagenta','m','thistle','magenta','orchid','mediumvioletred','deeppink','hotpink']
43 #Set pie chart related parameters#
44 plt.pie(finance, explode=explode, labels=labels,
45         colors=colors,autopct='%.2f%%',
46         pctdistance=0.8, labeldistance=1.1,
47         startangle=180,  radius=1.2,
48         counterclock=False,
49         wedgeprops={'linewidth':1.5, 'edgecolor':'white'},
50         textprops={'fontsize':10, 'color':'black'} )
51 #Set title#
52 plt.title('Pie chart of the year of creation of the top 100 achievements of China's third-order magic cube blind twist')
53 #Data persistence#
54 plt.savefig('Pie chart of year distribution of top 100 achievements of China's third-order blind screw speed.jpg')
55 #Display visual image#
56 plt.show()

 

Operation results:

 

 

 4. Data persistence

1 #Data persistence#
2 plt.savefig('By December 27, 2021, China's top 100 third-order Rubik's Cube.jpg')
3 plt.savefig('As of December 27, 2021, China's top 100 creation time scatter chart of third-order magic cube.jpg')
4 plt.savefig('As of December 27, 2021, the first 100 creation time scatter of China's third-order magic cube blind twist.jpg')
5 plt.savefig('Pie chart of year distribution of top 100 achievements in China's third-order magic cube.jpg')
6 plt.savefig('Pie chart of year distribution of top 100 achievements of China's third-order blind screw speed.jpg')

Operation results:

 

 5. Attach the complete code

  1 #Import the library we need#
  2 import requests
  3 from bs4 import BeautifulSoup
  4 import csv
  5 main_url = 'https://cubingchina.com'
  6 def get_html(url):
  7 #according to url Return the corresponding html object#
  8 #Write request header#
  9     headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}
 10     response = requests.get(url, headers=headers)
 11     response.encoding='utf-8-sig'
 12     html = response.text
 13     return html
 14 def get_soup(html):
 15 #according to HTML return beautifulsoup object#
 16     soup = BeautifulSoup(html, 'lxml')
 17     return soup
 18 def get_rank_list(soup):
 19 #use beautifulsoup Analyze the web page, obtain the player information, and return the player information list#
 20     rank_text_list = soup.find('table', {'class': 'table table-bordered table-condensed table-hover table-boxed'})
 21     rank_text_list = rank_text_list.find_all('tr')[1:]
 22     rank_list = []
 23     for rank_text in rank_text_list:
 24         rank_text = rank_text.find_all('td')
 25         rank = rank_text[1].extract().text
 26         name = rank_text[2].extract().text
 27         region = rank_text[3].extract().text
 28         result = rank_text[4].extract().text
 29         competitive = rank_text[5].extract().text
 30         date = rank_text[6].extract().text
 31         item = (rank, name, region, result, competitive, date)
 32         rank_list.append(item)
 33     return rank_list
 34 def writer_file(rank_list, flag=''):
 35 #Write the data of the third-order quick tightening csv file#
 36     header = ('ranking', 'full name','region', 'achievement', 'match', 'date')
 37     with open('333_single_china.csv', 'a',encoding='utf-8-sig',newline='') as file:
 38         writer = csv.writer(file)
 39         if flag == 'first':
 40             writer.writerow(header)
 41         for rank in rank_list:
 42             writer.writerow(rank)
 43 def main():
 44     start_url = 'https://cubing.com/results/rankings'
 45     html = get_html(start_url)
 46     soup = get_soup(html)
 47     # Write third-order single ranking to file#
 48     rank_list = get_rank_list(soup)
 49     writer_file(rank_list, 'first')
 50     print('The third-order single speed screw ranking data has been successfully crawled and saved!')
 51 if __name__=="__main__":
 52     main()
 53 def writer_file(rank_list, flag=''):
 54 #Write the data of the third-order blind screw csv file#
 55     header = ('ranking', 'full name','region', 'achievement', 'match', 'date')
 56     with open('333bf_single_china.csv', 'a',encoding='utf-8-sig',newline='') as file:
 57         writer = csv.writer(file)
 58         if flag == 'first':
 59             writer.writerow(header)
 60         for rank in rank_list:
 61             writer.writerow(rank)
 62 def main():
 63     start_url = 'https://cubing.com/results/rankings?region=China&event=333bf&gender=all&type=single'
 64     html = get_html(start_url)
 65     soup = get_soup(html)
 66     # Third order blind write file#
 67     rank_list = get_rank_list(soup)
 68     writer_file(rank_list, 'first')
 69     print('Third order blind twist single ranking data crawling succeeded, saved!')
 70 if __name__=="__main__":
 71     main()
 72     
 73 #utilize Matplotlib Draw the top 100 ranking histogram of China's third-order magic cube by December 27, 2021#
 74 #Import library#
 75 import pandas as pd
 76 import matplotlib
 77 import matplotlib.pyplot as plt
 78 plt.figure(figsize=(15, 7),dpi=600)
 79 
 80 # Solve Chinese and minus sign display problems#
 81 plt.rcParams['font.sans-serif']=['SimHei']
 82 plt.rcParams['axes.unicode_minus'] = False
 83 
 84 #Read the first 100 data of Chinese third-order Rubik's cube in a single quick twist csv file#
 85 df = pd.read_csv("C:\\Users\\Lenovo\\Desktop\\Chinese third-order cube data\\333_single_china.csv")
 86 
 87 #set up x Axis and y Axis display content#
 88 plt.bar(df['full name'], df['achievement'])
 89 
 90 #Set title#
 91 plt.title(r'By December 27, 2021, China's top 100 third-order Rubik's Cube(Company:s)')
 92 
 93 #set up x Axis label#
 94 plt.xlabel('Name of contestant')
 95 
 96 #set up y Axis label#
 97 plt.ylabel(r'Player performance')
 98 
 99 #take x Axis label font adjusted to 7#
100 plt.tick_params(labelsize=7)
101 
102 #take x Axis label font rotates 90 degrees clockwise#
103 plt.xticks(rotation=90)
104 
105 #Data persistence#
106 plt.savefig('By December 27, 2021, China's top 100 third-order Rubik's Cube.jpg')
107 
108 #Display visualization results#
109 plt.show()
110 
111 #utilize pyecharts Draw the top 100 ranking histogram of China's third-order magic cube blind twist by December 27, 2021#
112 #Import library#
113 import time
114 import json
115 import pandas as pd
116 import numpy as np
117 from pyecharts import options as opts
118 
119 bar = (
120     Bar()
121     .add_xaxis(df2["full name"].tolist())
122     .add_yaxis("achievement", df2["achievement"].tolist())
123     .set_global_opts(
124         #Setting of drawing title#
125         title_opts=opts.TitleOpts(
126             title="By December 27, 2021, China's top 100 third-order magic cube blind twist",
127             subtitle="(Only part is shown in the figure)",
128             pos_left="center",
129             pos_top="7%"),
130          #'shadow': Shadow indicator#
131         tooltip_opts=opts.TooltipOpts(
132             is_show=True, 
133             trigger="axis", 
134             axis_pointer_type="shadow"
135             ),
136         #x Axis coordinate configuration item,x Shaft label, 90 degrees clockwise#
137         xaxis_opts=opts.AxisOpts(name="full name",axislabel_opts={"interval":"0","rotate":90}),
138         #y Axis configuration item#
139         yaxis_opts=opts.AxisOpts(
140             name="achievement",min_=0,
141             type_="value",axislabel_opts=opts.LabelOpts(formatter="{value} second"),),
142         #Area scaling configuration item#
143         datazoom_opts=opts.DataZoomOpts(range_start=0,range_end=10),
144     )
145 )
146 bar.render_notebook()
147 
148 #utilize Matplotlib Draw the scatter chart of the creation time of the top 100 of China's third-order magic cube by December 27, 2021#
149 import pandas as pd
150 import matplotlib.pyplot as plt
151 plt.figure(figsize=(15, 6.5),dpi=600)
152 plt.title('As of December 27, 2021, China's top 100 creation time scatter chart of third-order magic cube',size=25,color="black")
153 
154 #Draw a scatter diagram#
155 plt.scatter(df['date'], df['Year of creation'])
156 plt.xticks(rotation=90)
157 
158 #Data persistence#
159 plt.savefig('As of December 27, 2021, China's top 100 creation time scatter chart of third-order magic cube.jpg')
160 
161 #Display the scatter chart
162 plt.show()
163 
164 #utilize Matplotlib Draw the scatter chart of the creation time of the first 100 of China's third-order magic cube blind twist by December 27, 2021#
165 import pandas as pd
166 import matplotlib.pyplot as plt
167 plt.figure(figsize=(15, 6.5),dpi=600)
168 plt.title('As of December 27, 2021, the first 100 creation time scatter of China's third-order magic cube blind twist',size=25,color="black")
169 
170 #Draw a scatter diagram#
171 plt.scatter(df2['date'], df2['Year of creation'])
172 plt.xticks(rotation=90)
173 
174 #Data persistence#
175 plt.savefig('As of December 27, 2021, the first 100 creation time scatter of China's third-order magic cube blind twist.jpg')
176 
177 #Display the scatter chart
178 plt.show()
179 
180 #utilize Matplotlib Draw the time distribution map of the top 100 achievements of China's third-order magic cube by December 27, 2021#
181 #Import library#
182 import pandas as pd
183 import matplotlib
184 import matplotlib.pyplot as plt
185 
186 #Set drawing size#
187 plt.figure(figsize=(15,7),dpi=80)
188 
189 #Set the display scale of each part#
190 finance=[len(df[df["Year of creation"]==2015]["Year of creation"]),
191          len(df[df["Year of creation"]==2017]["Year of creation"]),
192          len(df[df["Year of creation"]==2018]["Year of creation"]),
193          len(df[df["Year of creation"]==2019]["Year of creation"]),
194          len(df[df["Year of creation"]==2020]["Year of creation"]),
195          len(df[df["Year of creation"]==2021]["Year of creation"])]
196 
197 #Set up Chinese display to solve the problem of garbled code#
198 font = {'family' : 'MicroSoft YaHei',
199         'weight': 'bold',
200         'size': '12'}
201 matplotlib.rc("font",**font)
202 matplotlib.rc("font",
203               family='MicroSoft YaHei',
204               weight="bold")
205 labels = ["2015",
206           "2017",
207           "2018",
208           "2019",
209           "2020",
210           "2021"]
211 explode = [0.1, 0.1, 0.1, 0.1,0.1,0.1]
212 colors = ['mistyrose', 'salmon', 'tomato', 'darksalmon','mistyrose','orangered']
213 
214 #Set pie chart related parameters#
215 plt.pie(finance, explode=explode, labels=labels,
216         colors=colors,autopct='%.2f%%',
217         pctdistance=0.8, labeldistance=1.1,
218         startangle=180,  radius=1.2,
219         counterclock=False,
220         wedgeprops={'linewidth':1.5, 'edgecolor':'white'},
221         textprops={'fontsize':10, 'color':'black'} )
222 
223 #Set title#
224 plt.title('Pie chart of year distribution of top 100 achievements in China's third-order magic cube')
225 
226 #Data persistence#
227 plt.savefig('Pie chart of year distribution of top 100 achievements in China's third-order magic cube.jpg')
228 
229 #Display visual image#
230 plt.show()
231 
232 #utilize Matplotlib Draw the time distribution map of the top 100 achievements of China's third-order magic cube blind twist by December 27, 2021#
233 #Import library#
234 import pandas as pd
235 import matplotlib
236 import matplotlib.pyplot as plt
237 
238 #Set drawing size#
239 plt.figure(figsize=(15,7),dpi=80)
240 
241 #Set the display scale of each part#
242 finance=[len(df2[df2["Year of creation"]==2009]["Year of creation"]),
243          len(df2[df2["Year of creation"]==2010]["Year of creation"]),
244          len(df2[df2["Year of creation"]==2012]["Year of creation"]),
245          len(df2[df2["Year of creation"]==2013]["Year of creation"]),
246          len(df2[df2["Year of creation"]==2014]["Year of creation"]),
247          len(df2[df2["Year of creation"]==2015]["Year of creation"]),
248          len(df2[df2["Year of creation"]==2016]["Year of creation"]),
249          len(df2[df2["Year of creation"]==2017]["Year of creation"]),
250          len(df2[df2["Year of creation"]==2018]["Year of creation"]),
251          len(df2[df2["Year of creation"]==2019]["Year of creation"]),
252          len(df2[df2["Year of creation"]==2021]["Year of creation"])]
253 
254 #Set up Chinese display to solve the problem of garbled code#
255 font = {'family' : 'MicroSoft YaHei',
256         'weight': 'bold',
257         'size': '12'}
258 matplotlib.rc("font",**font)
259 matplotlib.rc("font",
260               family='MicroSoft YaHei',
261               weight="bold")
262 
263 #Set pie chart display#
264 labels = ["2009",
265           "2010",
266           "2012",
267           "2013",
268           "2014",
269           "2015",
270           "2016",
271           "2017",
272           "2018",
273           "2019",
274           "2021"]
275 explode = [0.1, 0.1, 0.1, 0.1,0.1,0.1,0.1, 0.1, 0.1, 0.1,0.1]
276 colors = ['plum', 'violet', 'purple', 'darkmagenta','m','thistle','magenta','orchid','mediumvioletred','deeppink','hotpink']
277 
278 #Set pie chart related parameters#
279 plt.pie(finance, explode=explode, labels=labels,
280         colors=colors,autopct='%.2f%%',
281         pctdistance=0.8, labeldistance=1.1,
282         startangle=180,  radius=1.2,
283         counterclock=False,
284         wedgeprops={'linewidth':1.5, 'edgecolor':'white'},
285         textprops={'fontsize':10, 'color':'black'} )
286 
287 #Set title#
288 plt.title('Pie chart of the year of creation of the top 100 achievements of China's third-order magic cube blind twist')
289 
290 #Data persistence#
291 plt.savefig('Pie chart of year distribution of top 100 achievements of China's third-order blind screw speed.jpg')
292 
293 #Display visual image#
294 plt.show()

V. summary

1. What conclusions can be drawn from the analysis and visualization of subject data? Is the expected goal achieved?

Conclusion: (1) in recent years, the results of China's third-order speed screwing and blind screwing have increased year by year, indicating that the strength of the players has increased year by year;

(2) compared with quick screwing, there is a large gap in the ranking of Qian 100 of blind screwing, and the time to create results is relatively scattered. China's blind screwing performance needs to be improved;

(3) generally speaking, the development of Chinese Rubik's cube industry shows a steady and good trend.

Achieved the expected goal.

2. What are the gains in the process of completing this design? And suggestions for improvement?

Harvest: during the design process, I learned a lot about Python crawlers and the powerful and convenient visualization function of Python. However, he also encountered some difficulties, exposing his unskilled knowledge of Python. I didn't study hard enough. I encountered some difficulties in the related content of crawler and visualization, but it was also solved after many attempts.

Suggestion: we still need to continue to work hard, learn more, try more and think more.