preface
The text and pictures of this article are from the Internet, only for learning and communication, not for any commercial purpose. The copyright belongs to the original author. If you have any questions, please contact us in time for handling.
Author: pk Ge
PS: if you need Python learning materials, you can click the link below to get them by yourself http://t.cn/A6Zvjdun
The other day, trump was tweeting for violence, but for the rollover king, it was already a routine operation. Pointing out the country on Twitter has become his new way of governing the country. I accidentally noticed his twitter. When I saw my mobile phone, twitter push kept on ringing. I saw it all over trump. A total of 200 twitter could be delivered to me.
Sometimes, Trump's remarks and opinions on twitter, in reality, denying all of their twitter, this is the strongest face perpetual motion machine, very magical.
I sorted out all the twitter news from Trump from January 2017 to May 31, 2020. This is an English Excel document. In order to facilitate Chinese word segmentation and viewing, I translated it into Chinese.
Read information and divide words
Then directly read the Excel document to read the tweet information of a specific column and return the result.
def excel_one_line_to_list(): data = pd.read_excel('/Users/brucepk/Documents/trump_20200530_Chinese Translation.xlsx', header=1, usecols=[0], names=None) df_li = data.values.tolist() result = [] for s_li in df_li: result.append(s_li[0]) return result
Next, the Chinese word segmentation database, jieba database, is used to segment these information.
if __name__ == '__main__': result = str(excel_one_line_to_list()) results = re.sub("[A-Za-z0-9\[\`\~\!\@\#\$\^\&\*\(\)\=\|\{\}\'\:\;\'\,\[\]\.\<\>\/\?\~\. \@\#\\\&\*\%]", "", result).replace('thank you', '').replace('today', '') text = '' for line in results: text += ' '.join(jieba.cut(line, cut_all=False)) print('text:', text) backgroud_Image = np.array(Image.open('trumppic2.png')) # Generate background image of word cloud
Configure cloud parameters
In the process of counting the words that have been divided, the words are displayed through WordCloud library to generate a word cloud map. The more times the words appear, the larger the font.
wc = WordCloud(scale=32, background_color='white', mask=backgroud_Image, font_path='/System/Library/Fonts/Supplemental/Songti.ttc', max_words=1000, max_font_size=100, random_state=42, mode='RGB') wc.generate_from_text(text) process_word = WordCloud.process_text(wc, text) sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True) print(sort[:50]) # Print out the top 50 words img_colors = ImageColorGenerator(backgroud_Image) wc.recolor(color_func=img_colors) plt.imshow(wc) plt.axis('off') wc.to_file("trump20.jpg") # Save word cloud in the same directory of the code print('Successfully generated word cloud!')
The parameters in WordCloud library are as follows:
- scale controls the sharpness of the image, up to 64. The 32 I set is already very high-definition. The higher this value is, the longer it takes to generate the image.
- background_color words cloud background color
- mask generates the background image of word cloud
- font_path the font in your computer. You need to give the path of the specific font
- max_ Number of words displayed in words cloud chart
- max_font_size the font size of the most frequent words
- random_state randomly generates colors
I change the background map into the side face of trump.
After the parameters are set, the cloud map can be generated.
process_word = WordCloud.process_text(wc, text) sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True) print(sort[:50]) # Print out the top 50 words img_colors = ImageColorGenerator(backgroud_Image) wc.recolor(color_func=img_colors) plt.imshow(wc) plt.axis('off') wc.to_file("trump20.jpg") # Save word cloud in the same directory of the code print('Successfully generated word cloud!')
The cloud chart of trump
This time I made the news of trump twitter and made a word cloud chart to see what the 200 tweets of a day are all about.
Full code
import pandas as pd import csv from wordcloud import WordCloud, ImageColorGenerator import matplotlib.pyplot as plt import jieba import re from matplotlib import colors import numpy as np from PIL import Image # Read Excel table information and return results def excel_one_line_to_list(): data = pd.read_excel('/Users/brucepk/Documents/trump_20200530_Chinese Translation.xlsx', header=1, usecols=[0], names=None) df_li = data.values.tolist() result = [] for s_li in df_li: result.append(s_li[0]) return result if __name__ == '__main__': result = str(excel_one_line_to_list()) results = re.sub("[A-Za-z0-9\[\`\~\!\@\#\$\^\&\*\(\)\=\|\{\}\'\:\;\'\,\[\]\.\<\>\/\?\~\. \@\#\\\&\*\%]", "", result).replace('thank you', '').replace('today', '') text = '' for line in results: text += ' '.join(jieba.cut(line, cut_all=False)) print('text:', text) backgroud_Image = np.array(Image.open('trumppic2.png')) # Generate background image of word cloud # Set parameters for generating word cloud, font_path is the font path in the computer. It needs to be changed to the font path in your computer wc = WordCloud(scale=32, background_color='white', mask=backgroud_Image, font_path='/System/Library/Fonts/Supplemental/Songti.ttc', max_words=1000, max_font_size=100, random_state=42, mode='RGB') wc.generate_from_text(text) process_word = WordCloud.process_text(wc, text) sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True) print(sort[:50]) # Print out the top 50 words img_colors = ImageColorGenerator(backgroud_Image) wc.recolor(color_func=img_colors) plt.imshow(wc) plt.axis('off') wc.to_file("trump20.jpg") # Save word cloud in the same directory of the code print('Successfully generated word cloud!')