Using Python to make a beautiful girl word cloud dance video

Posted by patrick99e99 on Tue, 01 Mar 2022 03:50:27 +0100

Recently, many children are learning Python and want to find a fun hand training project.

Let's share a simple Python project for beginners today.

This article will take blibli – ride the wind and waves video as an example, you get to download the video.

At the same time, python is used to climb the video barrage of station B, opencv is used to segment the video, baidu AI is used to segment the portrait, moviepy generates the word cloud dance video and adds audio.

Import module

Download required modules

We need to download many modules, so we can use OS System () method to automatically install the required modules. Of course, the download may fail, especially opencv python. Just install it a few more times

import os
import time
libs = {"lxml","requests","pandas","numpy","you-get","opencv-python","pandas","fake_useragent","matplotlib","moviepy"}
try:
    for lib in libs:
        os.system(f"pip3 install -i https://pypi.doubanio.com/simple/ {lib}")
        print(lib+"Download successful")
except:
    print("Download failed")

Import module

Here, import the required modules first

import os
import re
import cv2
import jieba
import requests
import moviepy
import pandas as pd
import numpy as np
from PIL import Image
from lxml import etree
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from fake_useragent import UserAgent

video processing

Download Video

Download dance videos from station B.

You can use you get to download videos. First install:

pip install you-get

Find the connection to download the video and use the following instructions to download it:

you-get -i https://www.bilibili.com/video/BV11C4y1h7nX

Marked with DEFAULT is the DEFAULT image quality.


Downloaded video.

Video segmentation

Using opencv, the video is divided into pictures. This paper intercepts 800 pictures to make word cloud.

opencv reads the video and calls the camera through the videoaptrue class

Code display

# -*- coding:utf-8 -*-
import cv2
cap = cv2.VideoCapture(r"Priceless sister~Let me ride the wind and waves~~~.flv")
num = 1
while 1:
    # Read the video frame by frame and save it to the local folder in sequence
    ret,frame = cap.read()
    if ret:
        cv2.imwrite(f".\pictures\img_{num}.jpg",frame)   
    else:
        break
cap.release()   # Release resources

Result display

Portrait segmentation

Create application

Use Baidu AI to create an application of human image segmentation.


Python SDK reference documentation

Use the reference documents to segment the portrait.

Reference documents:

https://cloud.baidu.com/doc/BODY/s/Rk3cpyo93?_=5011917520845


Code display

# -*- coding:utf-8 -*-
import cv2
import base64
import numpy as np
import os
from aip import AipBodyAnalysis
import time
import random

APP_ID = '******'
API_KEY = '*******************'
SECRET_KEY = '********************'

client = AipBodyAnalysis(APP_ID, API_KEY, SECRET_KEY)
# Save the path after image segmentation
path = './mask_img/'

# os.listdir lists the names of pictures saved to
img_files = os.listdir('./pictures')
print(img_files)
for num in range(1, len(img_files) + 1):
    # Construct picture paths in order
    img = f'./pictures/img_{num}.jpg'
    img1 = cv2.imread(img)
    height, width, _ = img1.shape
    # print(height, width)
    # Reading pictures in binary mode
    with open(img, 'rb') as fp:
        img_info = fp.read()

    # Set to return only the foreground, that is, the split portrait
    seg_res = client.bodySeg(img_info)
    labelmap = base64.b64decode(seg_res['labelmap'])
    nparr = np.frombuffer(labelmap, np.uint8)
    labelimg = cv2.imdecode(nparr, 1)
    labelimg = cv2.resize(labelimg, (width, height), interpolation=cv2.INTER_NEAREST)
    new_img = np.where(labelimg == 1, 255, labelimg)
    mask_name = path + 'mask_{}.png'.format(num)
    # Save the split portrait
    cv2.imwrite(mask_name, new_img)
    print(f'======== The first{num}Image segmentation completed ========')

Result display

Barrage crawling

Due to technical reasons, we changed this video to get the barrage, ha ha ha ha ha.

https://www.bilibili.com/video/BV1jZ4y1K78N

Web page analysis

Find pagelist through F12 and cid through the original url

Observe the historical barrage

  • Clear the elements and expand the barrage list

  • The date list is only 2021. Click other dates to come up with the history request.

Crawling barrage

Construct time series

The video was released on August 9, 2020. This paper crawls the historical bullet screen data of the video from August 8, 2020 to September 8, 2020 to construct a time series:

import pandas as pd
a = pd.date_range("2020-08-08","2020-09-08")
print(a) 
DatetimeIndex(['2020-08-08', '2020-08-09', '2020-08-10', '2020-08-11',
               '2020-08-12', '2020-08-13', '2020-08-14', '2020-08-15',
               '2020-08-50', '2020-08-17', '2020-08-18', '2020-08-19',
               '2020-08-20', '2020-08-21', '2020-08-22', '2020-08-23',
               '2020-08-24', '2020-08-25', '2020-08-26', '2020-08-27',
               '2020-08-28', '2020-08-29', '2020-08-30', '2020-08-31',
               '2020-09-01', '2020-09-02', '2020-09-03', '2020-09-04',
               '2020-09-05', '2020-09-06', '2020-09-07', '2020-09-08'],
              dtype='datetime64[ns]', freq='D')

Crawling data

Add a cookie and modify the oid

import requests
import pandas as pd
import re
import csv
from fake_useragent import UserAgent
from concurrent.futures import ThreadPoolExecutor
import datetime

ua = UserAgent()
start_time = datetime.datetime.now()

def  Grab_barrage(date):
    headers = {
        "origin": "https://www.bilibili.com",
        "referer": "https://www.bilibili.com/video/BV1jZ4y1K78N?from=search&seid=1084505810439035065",
        "cookie": "",
        "user-agent": ua.random(),
    }
    params = {
        'type': 1,
        'oid' : "222413092",
        'date': date
    }
    r= requests.get(url, params=params, headers=headers)
    r.encoding = 'utf-8'
    comment = re.findall('<d p=".*?">(.*?)</d>', r.text)
    for i in comments:
     df.append(i)
    a = pd.DataFrame(df)
    a.to_excel("danmu.xlsx")
def main():
    with ThreadPoolExecutor(max_workers=4) as executor:
        executor.map(Grab_barrage, date_list)
    """Calculate the time required"""
    delta = (datetime.datetime.now() - start_time).total_seconds()
    print(f'Time:{delta}s')
if __name__ == '__main__':
    # Target url
    url = "https://api.bilibili.com/x/v2/dm/history"
    start,end = '20200808','20200908'
    date_list = [x for x in pd.date_range(start, end).strftime('%Y-%m-%d')]
    count = 0
    main()

Result display

Generate word cloud

Mechanical compression and de duplication of comment content

For a comment, some people may make mistakes by hand or make up the number of words. They will repeat a word or word many times. Therefore, before word segmentation, they need to do the operation of "mechanical compression and de duplication".

def func(s):
    for i in range(1,int(len(s)/2)+1):
        for j in range(len(s)):
            if s[j:j+i] == s[j+i:j+2*i]:
                k = j + i
                while s[k:k+i] == s[k+i:k+2*i] and k<len(s):   
                    k = k + i
                s = s[:j] + s[k:]    
    return s
data["Short comment"] = data["Short comment"].apply(func)

Add stop words and custom phrases

import pandas as pd
from wordcloud import WordCloud
import jieba
from tkinter import _flatten
import matplotlib.pyplot as plt

jieba.load_userdict("./Word cloud picture//add.txt")
with open('./Word cloud picture//stoplist.txt', 'r', encoding='utf-8') as f:
    stopWords = f.read()

Generate word cloud

from wordcloud import WordCloud
import collections
import jieba
import re
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
with open('barrages.txt') as f:
    data = f.read()
jieba.load_userdict("./Word cloud picture//add.txt")

# Read data
with open('barrages.txt') as f:
    data = f.read()
jieba.load_userdict("./Word cloud picture//add.txt")
# Text preprocessing removes some useless characters and only extracts Chinese characters
new_data = re.findall('[\u4e00-\u9fa5]+', data, re.S)
new_data = "/".join(new_data)

# Text segmentation
seg_list_exact = jieba.cut(new_data, cut_all=True)

result_list = []
with open('./Word cloud picture/stoplist.txt', encoding='utf-8') as f:
    con = f.read().split('\n')
    stop_words = set()
    for i in con:
        stop_words.add(i)

for word in seg_list_exact:
    # Set stop words and remove individual words
    if word not in stop_words and len(word) > 1:
        result_list.append(word)

# Statistics of word frequency after filtering
word_counts = collections.Counter(result_list)
path = './wordcloud/'


img_files = os.listdir('./mask_img')
print(img_files)
for num in range(1, len(img_files) + 1):
    img = fr'.\mask_img\mask_{num}.png'
    # Get mask picture
    mask_ = 255 - np.array(Image.open(img))
    # Draw word cloud
    plt.figure(figsize=(8, 5), dpi=200)
    my_cloud = WordCloud(
        background_color='black',  # Set the background color to black by default
        mask=mask_,      # Custom mask
        mode='RGBA',
        max_words=500,
        font_path='simhei.ttf',   # Set font to display Chinese
    ).generate_from_frequencies(word_counts)

    # Displays the generated word cloud picture
    plt.imshow(my_cloud)
    # There is no coordinate axis in the display setting word cloud
    plt.axis('off')
    word_cloud_name = path + 'wordcloud_{}.png'.format(num)
    my_cloud.to_file(word_cloud_name)    # Save word cloud picture
    print(f'======== The first{num}Zhang CI cloud image generation ========')

Composite video

As described in the official documents, moviepy is a Python library for video editing, which can be used for cutting, splicing, title insertion, video synthesis (i.e. non-linear editing), video processing and custom effect design. In general, you can easily and freely process video, pictures and other files.

Picture synthesis

import cv2
import os

# Save path of output video
video_dir = 'result.mp4'
# Frame rate
fps = 30
# Picture size
img_size = (1920, 1080)

fourcc = cv2.VideoWriter_fourcc('M', 'P', '4', 'V')  # opencv3.0 mp4 will have a warning, but it can be played
videoWriter = cv2.VideoWriter(video_dir, fourcc, fps, img_size)
img_files = os.listdir('.//wordcloud')

for i in range(88, 888):
    img_path = './/wordcloud//wordcloud_{}.png'.format(i)
    frame = cv2.imread(img_path)
    frame = cv2.resize(frame, img_size)   # The size of the generated video picture is the same as the set size
    videoWriter.write(frame)      # Write it in the video
    print(f'======== In video order{i}Picture into video ========')

videoWriter.release()   # Release resources

Result display:

Audio add

import moviepy.editor as mpy

# Read word cloud video
my_clip = mpy.VideoFileClip('result.mp4')
# Intercept background music
audio_background = mpy.AudioFileClip('song.mp3').subclip(0,25)
audio_background.write_audiofile('song1.mp3')
# Insert audio into video
final_clip = my_clip.set_audio(audio_background)
# Save beautiful music as the final video! Beautiful little sister CI Yun dance video!
final_clip.write_videofile('final_video.mp4')

Post push, learning and communication

We need a lot of front-end posts, python posts, Java posts, Android and iOS development posts. Working location: Beijing byte. Welcome to scan the QR code below for internal push

Python information, technology, courses, answers and consultation can also directly click the business card below to add the official customer service Qi ↓

Topics: Python OpenCV Programmer