I've previously posted a film about using reptile technology from Article on obtaining stock data on NetEase Finance , the response is quite strong. However, that work can only be said to be an epitome of my early learning of reptiles and quantification. What I want to share with you now is very Dry goods. Basically, it can be said that it is my final version of stock data download text, and even a cornerstone of Tushare download data.
1, What is Tushare?
Basically, we know more or less about quantification in China Tushare This library is basically used by bosses Tushare To share your work. As a relatively price friendly financial database, it is very conscientious. Compared with Wind, the price is still very strong.
Let's see Tushare What are the functions of pypi
- easy to use as most of the data returned are pandas DataFrame objects
- can be easily saved as csv, excel or json files
- can be inserted into MySQL or Mongodb
Tushare Target population:
- financial market analyst of China
- learners of financial data analysis with pandas/NumPy
- people who are interested in China financial data
All right, let's talk about it. Let's get down to business.
First of all, let me make a statement:
- The code I share today can't be run with Jupyter, but you can Debug with Jupyter. I hope you understand.
- If you use the code, remember to help me promote it. Ha, hey.
- Tushare needs money. I use an annual fee of 500 yuan. It is said that 200 yuan can have rich functions. To tell you the truth, if you really want to develop quantitative research, I think it's a good choice to start with 200 yuan. What can 200 yuan do? Please browse it yourself . The 200 yuan gear is 2000 points. I have roughly observed it and basically covered all functions.
1. Import and storage
The code is as follows (example):
import os import time from datetime import datetime as dt from datetime import timedelta import threading import pandas as pd import numpy as np import requests
import tushare as ts token = 'This place is for you Tushare Secret key' ts.set_token(token) pro = ts.pro_api(token)
2. Logic of trading day
as_of_today = str(dt.now().strftime('%Y%m%d')) def last_trading_day(): data = pro.query('trade_cal', start_date='20200101', end_date=as_of_today, is_open='1') trading_dates = data['cal_date'] d0 = dt.now() trading_dates_list = trading_dates.tolist() if as_of_today in trading_dates.values: if d0.hour >= 16: today_index = trading_dates_list.index(as_of_today) latest_trading_date = trading_dates_list[int(today_index)] return latest_trading_date else: previous_trading_date = trading_dates.values[-2] return previous_trading_date else: previous_trading_date = trading_dates.values[-1] return previous_trading_date
The earliest source of this code I saw it here. Then I processed it myself. The specific meaning is: "If today is in the transaction calendar list and the current time is after 16 o'clock, give me today's date; if it is not until 16 o'clock, give me yesterday's date; if today is not in the transaction calendar list, return to the last date in the list". Why 16:00? Generally speaking, the database of the day will be updated one hour after the closing, so you can get the data of the day.
3. First transfer out the basic data of individual stocks every day
data = pro.daily_basic(trade_date=last_trading_day())
The data returned is as follows:
name type describe ts_code str TS Stock code trade_date str Transaction date close float Closing price of the day turnover_rate float Turnover rate(%) turnover_rate_f float Turnover rate (freely tradable shares) volume_ratio float Volume ratio pe float P / E ratio (total market value)/Net profit, loss PE (empty) pe_ttm float P / E ratio( TTM，Deficit PE (empty) pb float Total market value (net market value)/Net assets) ps float Market sales rate ps_ttm float Market sales rate( TTM) dv_ratio float Dividend yield(%) dv_ttm float Dividend yield( TTM)(%) total_share float Total share capital (10000 shares) float_share float Circulating share capital (10000 shares) free_share float Freely tradable share capital (10000) total_mv float Total market value (10000 yuan) circ_mv float Current market value (10000 yuan)
I believe you are unlikely to want to download and update all the data of more than 4000 stocks every day, right? After we have these relatively basic fundamental information, we can process and screen our more ideal candidate stocks a little.
x1 = data.close < 200 # The closing price is less than 200 yuan x2 = data.pe < 100 # The price earnings ratio is less than 100 times x3 = data.pb < 10 # The price to book ratio is less than 10 times x4 = data.turnover_rate > 1 # Turnover rate greater than 1 x = x1 & x2 & x3 & x4 stock_list_1 = data[x].ts_code.values.tolist()
By filtering our candidate stocks in this way, we can first reduce our workload and scope.
3. Next, let's organize the second group of stock pools:
data2 = pro.query('stock_basic')
The data returned is as follows:
name type Default display describe ts_code str Y TS code symbol str Y Stock code name str Y Stock name area str Y region industry str Y Industry fullname str N Full name of stock enname str N Full English name cnspell str N Ren Ping market str Y Market type (motherboard)/Gem/Scientific innovation board/CDR) exchange str N Exchange code curr_type str N Transaction currency list_status str N Listing status L list D Delisting P Suspension of listing list_date str Y Listing date delist_date str N Delisting date is_hs str N Is it the subject of Shanghai Shenzhen Hong Kong stock connect, N no H Shanghai Stock connect S Shengutong
I generally like to see stocks listed for at least one year. I'm not sure about secondary new shares, so I won't touch them. How do I screen?
AS_of_Today = int(dt.now().strftime('%Y%m%d')) data2 = data2[data2['list_date'].apply(int).values < (AS_of_Today-360)]
I don't call * ST very much. Don't be so special. One day, it's BBQ if it's delisted directly.
data2 = data2[-data2.name.str.startswith('*')]
I don't know much about some industries or my dishes:
data2 = data2[-data2.industry.isin(['bank','Insurance','real estate','Regional real estate',])
Then we pack a list:
stock_list_2 = data2.ts_code.values.tolist()
If you have your own list and want to add it, please load it yourself.
4. Merge List
I have two Lists above, right? Suppose we have two private goods Lists. Then we will combine the four Lists into one List and remove similar items to prevent repeated downloading and waste of resources.
stock_list = [value for value in stock_list_1 if value in stock_list_2] + stock_list_3 + stock_list_4 stock_list = list(dict.fromkeys(stock_list))
To sum up, we now have a stock code pool for our primary selection.
In the next article, I'll talk about how to encapsulate some of the contents of the build download.
Please give me some praise!
You can leave a message for Wei Xin's friends or private letters.