Preface
Recently, the new year's Party of station B has swept over all major video websites due to its unique creativity, which has brought great positive impact to the company. At the same time, the stock price has also skyrocketed. Presumably, everyone regrets not buying the shares of station B earlier:
However, today we are not going to discuss the new year's Party of station B, but the core resource of station B: "amazing" grandma owners. The inspiration of this article comes from a problem on the hot list:
Data acquisition
There are 859 answers to the above questions, and the data in this paper is also from this. Because many answers will reflect the link with Grandma's master ID in the answers, as shown in the following figure:
We can crawl the ID of grandma's main space in the question, but considering that not all the answers will have such ID, we extract some bold fonts to get some grandma's name as a supplement to the data:
The answer above is a typical case. It mentioned that the pupils who had received cook's birthday wishes were very popular before. Some codes for extracting data are as follows:
# Start crawling data driver = webdriver.Chrome() driver.maximize_window() url = 'https://www.zhihu.com/question/291506148' js='window.open("'+url+'")' driver.execute_script(js) for i in range(1000): time.sleep(1) js="var q=document.documentElement.scrollTop=10000000" driver.execute_script(js) print(i) # Collating data all_html = [k.get_property('innerHTML') for k in driver.find_elements_by_class_name('AnswerItem')] all_text = ''.join(all_html) pat = '/space.bilibili.com/\d+' spaces = list(set([k for k in re.findall(pat,all_text)]))
Now we have obtained the ID s of these "amazing" grandma owners. The next step is to crawl the personal space of their B station to get more detailed information:
The above is the personal space of well-known scientist Geng in station B, from which we can get the number of fans of the manual Geng, the main types of video (I always thought it should be science and technology, but I didn't expect that it could be life, and station B's operation) and the average number of playback, bullet screens and comments of all videos, as the basis for subsequent ranking, some codes are as follows:
upstat = pd.DataFrame(columns=['name','fans','face','main_type','total_video', 'total_play', 'total_comment']) for i in range(len(spaces)): try: time.sleep(1) space_id = str(spaces[i].replace('/space.bilibili.com/','')) url= 'https://api.bilibili.com/x/web-interface/card?mid={}&jsonp=jsonp&article=true'.format(space_id) html = requests.get(url=url, cookies=cookie, headers=header).content data = json.loads(html.decode('utf-8'))['data'] this_name = data['card']['name'] this_fans = data['card']['fans'] this_face = data['card']['face'] this_video = int(data['archive_count']) total_page = int((this_video-1)/30)+1 video_list=[] for j in range(total_page): url = 'https://api.bilibili.com/x/space/arc/search?mid={}&ps=30&tid=0&pn={}&keyword=&order=click&jsonp=jsonp'.format(space_id,str(j+1)) html = requests.get(url=url, cookies=cookie, headers=header).content data = json.loads(html.decode('utf-8')) if j == 0 : type_list = data['data']['list']['tlist'] this_list = data['data']['list']['vlist'] video_list = video_list + [ this_list [k] for k in range(len(this_list))] type_list = list(type_list.values()) type_list = {type_list[k]['name']:int(type_list[k]['count']) for k in range(len(type_list))} this_type = max(type_list,key=type_list.get) this_play = sum([video_list[k]['play'] for k in range(len(video_list)) if video_list[k]['play'] != '--']) this_comment = sum([video_list[k]['comment'] for k in range(len(video_list)) if video_list[k]['comment'] != '--']) upstat = upstat.append({'name':this_name, 'fans':this_fans, 'face':this_face, 'main_type':this_type, 'total_video':this_video, 'total_play':this_play, 'total_comment':this_comment}, ignore_index=True) print('success:'+str(i)) except: print('fail:'+str(j)) continue
Finally, we got more than 200 "amazing" grandma owners' information of station B. the overview data is as follows:
General overview
After obtaining these data, let's first look at the distribution of the main types of videos released by these "amazing" grandma owners:
As the classification of station B's life is all inclusive, Geng and Li Ziqi are classified into the life category, which is mysterious to think about. Therefore, this type of video is divided into many groups, and the proportion of technology and digital category is also very large, which confirms the conclusion that station B is an excellent learning website. For those who are interested, please refer to another article: Do you believe that you can learn programming by visiting B station?
In addition, video can be collectively referred to as entertainment, including games, film and television. After that, video types will be divided according to technology, life and entertainment to find the grandma who is the most "amazing" in each category.
Before starting the official ranking, first use Python to splice the avatars of these grandma masters and get the following pictures to see how many grandma masters you are familiar with at a glance:
This part of the code is as follows:
i = 0 for i in range(upstat.shape[0]): loc = 'D:/Reptile/Be frightened by heaven and earth/'+upstat['name'][i]+'.jpg' # request.urlretrieve(upstat['face'][i],loc) img = mpimg.imread(loc)[:,:,0:3] img = cv2.resize(img, (500,500),interpolation=cv2.INTER_CUBIC) if i % 20 == 0: row_img=img elif i == 19: row_img=np.hstack((row_img,img)) all_img = row_img elif i % 20 == 19: row_img=np.hstack((row_img,img)) all_img = np.vstack((all_img,row_img)) else: row_img=np.hstack((row_img,img)) i = i+1 plt.axis('off') plt.margins(0,0) plt.imshow(all_img) plt.savefig('Head portrait.png',dpi=1000)
Comprehensive ranking
The next thing we need to do is to be bold. We need to be brave to rank these grandma owners. Considering the number of fans, the average number of video screens, the number of plays, and the number of comments, we can get a comprehensive index. We hereby declare that this ranking is only for entertainment. If we want to study it deeply, AWSL
First of all, let's take a look at the top 10 grannies:
Xiaobian has just been listed on Amway's wizard finance and economics list. I suggest you take a look at it. It's true that he is very grounded in complex financial knowledge. The two famous grandma owners, Hua Nong brothers and Jing Hanqing, are also listed on the list. Let's look at the list of top 11-20:
Xu dasao, Li Ziqi and manual Geng appear in the list at the same time. I hope someone can plan a cooperation among them in the future. The process is all figured out. Manual Geng provides post-modern tools for Li Ziqi. Li Ziqi makes the hottest pepper in the world with the artifact of manual Geng, and then eats it by Xu dasao. Manual Geng finally uses his own brain melon to kill Xu Big Sao alleviates the discomfort of pepper
Classification ranking
After the comprehensive ranking, all grandma owners will be ranked according to technology, life and entertainment, respectively living in the top 10 of each category:
With classified ranking, you can ask for it on demand according to your preferences. I believe that after watching, the brain hole will become bigger in grammar. After a while, you can try to release videos in station B and become a famous (just weird) grandma with double-digit fans in station B
At the end of the article, Geng used the video with the most manual playback in station B as the end of the article. This video very well reflects the theme of this article, "amazing people". I hope you can try it yourself. If you can use it up, you can write down your use feeling in all fours. Welcome to share it with us