Python Sao operates the mouth of trump, the deceptive ghost!

Posted by dmikester1 on Wed, 17 Jun 2020 09:34:49 +0200

preface

The text and pictures of this article are from the Internet, only for learning and communication, not for any commercial purpose. The copyright belongs to the original author. If you have any questions, please contact us in time for handling.

Author: pk Ge

PS: if you need Python learning materials, you can click the link below to get them by yourself http://t.cn/A6Zvjdun

The other day, trump was tweeting for violence, but for the rollover king, it was already a routine operation. Pointing out the country on Twitter has become his new way of governing the country. I accidentally noticed his twitter. When I saw my mobile phone, twitter push kept on ringing. I saw it all over trump. A total of 200 twitter could be delivered to me.

Sometimes, Trump's remarks and opinions on twitter, in reality, denying all of their twitter, this is the strongest face perpetual motion machine, very magical.

I sorted out all the twitter news from Trump from January 2017 to May 31, 2020. This is an English Excel document. In order to facilitate Chinese word segmentation and viewing, I translated it into Chinese.

Read information and divide words

Then directly read the Excel document to read the tweet information of a specific column and return the result.

def excel_one_line_to_list():
    data = pd.read_excel('/Users/brucepk/Documents/trump_20200530_Chinese Translation.xlsx', header=1, usecols=[0], names=None)
    df_li = data.values.tolist()
    result = []
    for s_li in df_li:
        result.append(s_li[0])
    return result

Next, the Chinese word segmentation database, jieba database, is used to segment these information.

if __name__ == '__main__':
    result = str(excel_one_line_to_list())
    results = re.sub("[A-Za-z0-9\[\`\~\!\@\#\$\^\&\*\(\)\=\|\{\}\'\:\;\'\,\[\]\.\<\>\/\?\~\. \@\#\\\&\*\%]", "", result).replace('thank you', '').replace('today', '')
    text = ''
    for line in results:
        text += ' '.join(jieba.cut(line, cut_all=False))
    print('text:', text)
    backgroud_Image = np.array(Image.open('trumppic2.png'))   # Generate background image of word cloud

Configure cloud parameters

In the process of counting the words that have been divided, the words are displayed through WordCloud library to generate a word cloud map. The more times the words appear, the larger the font.

    wc = WordCloud(scale=32, background_color='white', mask=backgroud_Image,
                   font_path='/System/Library/Fonts/Supplemental/Songti.ttc',
                   max_words=1000, max_font_size=100, random_state=42, mode='RGB')
    wc.generate_from_text(text)

    process_word = WordCloud.process_text(wc, text)
    sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True)
    print(sort[:50])   # Print out the top 50 words
    img_colors = ImageColorGenerator(backgroud_Image)
    wc.recolor(color_func=img_colors)
    plt.imshow(wc)
    plt.axis('off')
    wc.to_file("trump20.jpg")  # Save word cloud in the same directory of the code
    print('Successfully generated word cloud!')

The parameters in WordCloud library are as follows:

  • scale controls the sharpness of the image, up to 64. The 32 I set is already very high-definition. The higher this value is, the longer it takes to generate the image.
  • background_color words cloud background color
  • mask generates the background image of word cloud
  • font_path the font in your computer. You need to give the path of the specific font
  • max_ Number of words displayed in words cloud chart
  • max_font_size the font size of the most frequent words
  • random_state randomly generates colors

I change the background map into the side face of trump.

After the parameters are set, the cloud map can be generated.

    process_word = WordCloud.process_text(wc, text)
    sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True)
    print(sort[:50])   # Print out the top 50 words
    img_colors = ImageColorGenerator(backgroud_Image)
    wc.recolor(color_func=img_colors)
    plt.imshow(wc)
    plt.axis('off')
    wc.to_file("trump20.jpg")  # Save word cloud in the same directory of the code
    print('Successfully generated word cloud!')

The cloud chart of trump

This time I made the news of trump twitter and made a word cloud chart to see what the 200 tweets of a day are all about.

Full code

import pandas as pd
import csv
from wordcloud import WordCloud, ImageColorGenerator
import matplotlib.pyplot as plt
import jieba
import re
from matplotlib import colors
import numpy as np
from PIL import Image

# Read Excel table information and return results
def excel_one_line_to_list():
    data = pd.read_excel('/Users/brucepk/Documents/trump_20200530_Chinese Translation.xlsx', header=1, usecols=[0], names=None)
    df_li = data.values.tolist()
    result = []
    for s_li in df_li:
        result.append(s_li[0])
    return result


if __name__ == '__main__':
    result = str(excel_one_line_to_list())
    results = re.sub("[A-Za-z0-9\[\`\~\!\@\#\$\^\&\*\(\)\=\|\{\}\'\:\;\'\,\[\]\.\<\>\/\?\~\. \@\#\\\&\*\%]", "", result).replace('thank you', '').replace('today', '')
    text = ''
    for line in results:
        text += ' '.join(jieba.cut(line, cut_all=False))
    print('text:', text)
    backgroud_Image = np.array(Image.open('trumppic2.png'))   # Generate background image of word cloud

    # Set parameters for generating word cloud, font_path is the font path in the computer. It needs to be changed to the font path in your computer
    wc = WordCloud(scale=32, background_color='white', mask=backgroud_Image,
                   font_path='/System/Library/Fonts/Supplemental/Songti.ttc',
                   max_words=1000, max_font_size=100, random_state=42, mode='RGB')
    wc.generate_from_text(text)

    process_word = WordCloud.process_text(wc, text)
    sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True)
    print(sort[:50])   # Print out the top 50 words
    img_colors = ImageColorGenerator(backgroud_Image)
    wc.recolor(color_func=img_colors)
    plt.imshow(wc)
    plt.axis('off')
    wc.to_file("trump20.jpg")  # Save word cloud in the same directory of the code
    print('Successfully generated word cloud!')

Topics: Excel Lambda Database Python