[teach you how to download stock data from Tushare library and save it in hard disk]

Posted by bobleny on Wed, 02 Mar 2022 16:31:57 +0100

preface

I've previously posted a film about using reptile technology from Article on obtaining stock data on NetEase Finance , the response is quite strong. However, that work can only be said to be an epitome of my early learning of reptiles and quantification. What I want to share with you now is very Dry goods. Basically, it can be said that it is my final version of stock data download text, and even a cornerstone of Tushare download data.

1, What is Tushare?

Basically, we know more or less about quantification in China Tushare This library is basically used by bosses Tushare To share your work. As a relatively price friendly financial database, it is very conscientious. Compared with Wind, the price is still very strong.

Let's see Tushare What are the functions of pypi

easy to use as most of the data returned are pandas DataFrame objects
can be easily saved as csv, excel or json files
can be inserted into MySQL or Mongodb

Tushare Target population:

financial market analyst of China
learners of financial data analysis with pandas/NumPy
people who are interested in China financial data

All right, let's talk about it. Let's get down to business.

2, Code

First of all, let me make a statement:

The code I share today can't be run with Jupyter, but you can Debug with Jupyter. I hope you understand.
If you use the code, remember to help me promote it. Ha, hey.
Tushare needs money. I use an annual fee of 500 yuan. It is said that 200 yuan can have rich functions. To tell you the truth, if you really want to develop quantitative research, I think it's a good choice to start with 200 yuan. What can 200 yuan do? Please browse it yourself . The 200 yuan gear is 2000 points. I have roughly observed it and basically covered all functions.

1. Import and storage

The code is as follows (example):

import os
import time
from datetime import datetime as dt
from datetime import timedelta
import threading
import pandas as pd
import numpy as np
import requests

import tushare as ts
token = 'This place is for you Tushare Secret key'
ts.set_token(token)
pro = ts.pro_api(token)

2. Logic of trading day

as_of_today = str(dt.now().strftime('%Y%m%d'))
def last_trading_day():
    data = pro.query('trade_cal',
                     start_date='20200101',
                     end_date=as_of_today,
                     is_open='1')
    trading_dates = data['cal_date']
    d0 = dt.now()
    trading_dates_list = trading_dates.tolist()

    if as_of_today in trading_dates.values:
        if d0.hour >= 16:
            today_index = trading_dates_list.index(as_of_today)
            latest_trading_date = trading_dates_list[int(today_index)]
            return latest_trading_date
        else:
            previous_trading_date = trading_dates.values[-2]
            return previous_trading_date
    else:
        previous_trading_date = trading_dates.values[-1]
        return previous_trading_date

The earliest source of this code I saw it here. Then I processed it myself. The specific meaning is: "If today is in the transaction calendar list and the current time is after 16 o'clock, give me today's date; if it is not until 16 o'clock, give me yesterday's date; if today is not in the transaction calendar list, return to the last date in the list". Why 16:00? Generally speaking, the database of the day will be updated one hour after the closing, so you can get the data of the day.

3. First transfer out the basic data of individual stocks every day

data = pro.daily_basic(trade_date=last_trading_day())

The data returned is as follows:

name				type		describe
ts_code			str		TS Stock code
trade_date		str		Transaction date
close			float	Closing price of the day
turnover_rate	float	Turnover rate(%)
turnover_rate_f	float	Turnover rate (freely tradable shares)
volume_ratio	float	Volume ratio
pe				float	P / E ratio (total market value)/Net profit, loss PE (empty)
pe_ttm			float	P / E ratio( TTM，Deficit PE (empty)
pb				float	Total market value (net market value)/Net assets)
ps				float	Market sales rate
ps_ttm			float	Market sales rate( TTM)
dv_ratio		float	Dividend yield(%)
dv_ttm			float	Dividend yield( TTM)(%)
total_share		float	Total share capital (10000 shares)
float_share		float	Circulating share capital (10000 shares)
free_share		float	Freely tradable share capital (10000)
total_mv		float	Total market value (10000 yuan)
circ_mv			float	Current market value (10000 yuan)

I believe you are unlikely to want to download and update all the data of more than 4000 stocks every day, right? After we have these relatively basic fundamental information, we can process and screen our more ideal candidate stocks a little.

x1 = data.close < 200  # The closing price is less than 200 yuan
x2 = data.pe < 100  # The price earnings ratio is less than 100 times
x3 = data.pb < 10  # The price to book ratio is less than 10 times
x4 = data.turnover_rate > 1  # Turnover rate greater than 1
x = x1 & x2 & x3 & x4
stock_list_1 = data[x].ts_code.values.tolist()

By filtering our candidate stocks in this way, we can first reduce our workload and scope.

3. Next, let's organize the second group of stock pools:

data2 = pro.query('stock_basic')

The data returned is as follows:

name			type		Default display	describe
ts_code		str		Y		TS code
symbol		str		Y		Stock code
name		str		Y		Stock name
area		str		Y		region
industry	str		Y		Industry
fullname	str		N		Full name of stock
enname		str		N		Full English name
cnspell		str		N		Ren Ping 
market		str		Y		Market type (motherboard)/Gem/Scientific innovation board/CDR)
exchange	str		N		Exchange code
curr_type	str		N		Transaction currency
list_status	str		N		Listing status L list D Delisting P Suspension of listing
list_date	str		Y		Listing date
delist_date	str		N		Delisting date
is_hs		str		N		Is it the subject of Shanghai Shenzhen Hong Kong stock connect, N no H Shanghai Stock connect S Shengutong

I generally like to see stocks listed for at least one year. I'm not sure about secondary new shares, so I won't touch them. How do I screen?

AS_of_Today = int(dt.now().strftime('%Y%m%d'))
data2 = data2[data2['list_date'].apply(int).values < (AS_of_Today-360)]

I don't call * ST very much. Don't be so special. One day, it's BBQ if it's delisted directly.

data2 = data2[-data2.name.str.startswith('*')]

I don't know much about some industries or my dishes:

data2 = data2[-data2.industry.isin(['bank','Insurance','real estate','Regional real estate',])

Then we pack a list:

stock_list_2 = data2.ts_code.values.tolist()

If you have your own list and want to add it, please load it yourself.

4. Merge List

I have two Lists above, right? Suppose we have two private goods Lists. Then we will combine the four Lists into one List and remove similar items to prevent repeated downloading and waste of resources.

stock_list = [value for value in stock_list_1 if value in stock_list_2] + stock_list_3 + stock_list_4
stock_list = list(dict.fromkeys(stock_list))

summary

To sum up, we now have a stock code pool for our primary selection.
In the next article, I'll talk about how to encapsulate some of the contents of the build download.
Please give me some praise!
You can leave a message for Wei Xin's friends or private letters.

Topics: Python Data Analysis list pandas tushare

Programmer Think