Tiktok comprehensive list of data crawler cases

Posted by murtuza.hasan.13 on Mon, 22 Nov 2021 21:30:23 +0100

Occasionally, we saw a series of charts in tiktok creation platform, such as hot search list, popular video list, entertainment star list, music list and so on.

Web link: https://creator.douyin.com/billboard/home Visible data after login

List interface

Interface name

type

link

Hot search list

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=1

Hot spot rising list

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=9

Today's top videos

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=4

Entertainment star

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=2

Sports heat

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=3

Live list

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=10

Hot song list

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=5

Music soaring list

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=6

Original music list

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=7

Quadratic list

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=61

Funny list

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=86

travel

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=91

plot

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=81

Food list

Get

https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=71

The above interfaces cannot be accessed directly. You need to add a Referer when requesting. Take the data of today's popular video as an example

Today's top videos

Given the data interface and request mode of today's popular videos, you can directly use requests. The code is very simple. You only need to change the url to request other interfaces.

import requests

hot_video_url = 'https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=4'

headers = {
           "user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36",
           "referer": "https://creator.douyin.com/billboard/hot_aweme"
           }

response = requests.get(url=hot_video_url, headers=headers).json()

print(response)

Return data example:

Get video details

In the returned data, you can see that there is no detailed content of the video, Only the author's name is auhor, and the video background is img_url, share page link, rank, title, popularity value.

When you want to get more video information, such as likes, forwards, comments or author information, you need to obtain data through other interfaces.

Here, you can select a sharing Link: https://www.iesdouyin.com/share/video/6844023242781412622/?region=CN&mid=6844023258854345479&u_code=0&titleType=title

The data interface is found through packet capturing: https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids=6844023242781412622

(the interface can be accessed directly. There is only one item_ids parameter. You can find that the item_ids is the same as the id after the video on the sharing link)

The returned data are as follows:

Code example

First get the collection of today's popular videos, and then extract the item corresponding to each video_ IDS, and then obtain detailed video data according to the id

# -*- coding: utf-8 -*-

import requests
import re

hot_video_url = 'https://creator.douyin.com/aweme/v1/creator/data/billboard/?billboard_type=4'

headers = {
           "user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36",
           "referer": "https://creator.douyin.com/billboard/hot_aweme"
           }

response = requests.get(url=hot_video_url, headers=headers).json()


for video in response['billboard_data']:
    link = video['link']            # Share page link
    title = video['title']          # Video title
    rank = (video['rank'])          # Current ranking
    hot_value = video['value']      # Current heat

    items_ids = re.findall('video/(.*?)/', link)[0]    # id required to obtain detail data

    video_detail_url = 'https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids={}'.format(items_ids)

    detail = requests.get(video_detail_url,headers=headers).json()

    print(detail['item_list'][0]['share_url'])

    break   # Take only one, example

Just looked at it and updated 2020 / 09 / 08

Some interfaces need to add cookie s when requesting, otherwise {status_msg ':' operation without permission '}

You can copy the entire cookie and add it to the header, or just copy it to the sid_guard

headers = {
           "user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36",
           "referer": "https://creator.douyin.com/billboard/hot_aweme",
           "cookie":"sid_guard=((this is a demonstration)"
           }