Introduction to multi factor screening stage
1. Tasks in the filtering phase
-
Single factor validity analysis
Selection of 500 factors = = > 100 factors, which really contribute to the yield
-
Multi factor correlation analysis
Get the high correlation and delete some
-
Multifactor synthesis
With so many characteristics, how to find the stock return corresponding to the factor?
Relationship between factor and yield factor eigenvalue
- Link between factor and yield factor (eigenvalue), yield (target value)
2. Process of mining factors
1. From hundreds of factors, the factors that are effective for the rate of return are analyzed
-
Filter in each category of factors, and filter out N effective factors in each category
- Quality, valuation, growth and other factors
- Strict: for example, 20 effective factors
- Not strict: for example, there are 50 effective factors
-
Do correlation analysis among the selected single factors and combine the factors with strong correlation
Finally, the effective and weak correlation factors are obtained, the number is small, generally about 10
Audition > n factors > selection = = > n factors
Objective of multivariate validity analysis
1. Several problems in validity analysis
-
IC analysis of factors
Correlation strength between judgment factor and income
-
Yield analysis of factor
Determine the stock direction of the factor
Direction of factor
-
Factor ascending order the smaller the factor value, the better, such as P / E ratio
-
Factor descending order: the larger the factor value, the better, such as profit
-
Factor neutral factor direction uncertainty, such as turnover
-
Get two tables to filter
Single factor validity analysis – factor IC analysis
Factor IC analysis determines the correlation between factor and yield.
- IC mean: average value of factor IC
- IC std: standard deviation
- IC > 0.02: ratio with factor greater than 0.02
- IR: information ratio
- IR = IC mean / IC volatility (std)
- IR > 0.3 for screening, the standard can be modified
1. Definition of information coefficient
Correlation between factor data and stock return
The IC of a period refers to the cross-sectional correlation coefficient between the factor exposure value of that period and the actual return value of the stock in the next period.
1.1 factor exposure value
It refers to the value of the factor itself
The cycle is one day: (this period) the factor exposure value of 20180103, (next period) 20180104: stock return
Cycle one week, one month (similar)
1.2 calculation method
- Spearman correlation coefficient (Rank IC)
- If X increases, Y tends to increase. The Spearman correlation coefficient is positive.
- Value [- 1, 1]
1.3 information coefficient API
import scipy.stats as st st.spearmanr(fund['pe_ratio'], fund['return']) # stock yield
2. How to find the rate of return
2.1 yield range
- By interval size
- Daily rate of return
- Monthly rate of return
- Annual rate of return
2.2 calculation formula
Yield of a period = (closing price - closing price (previous period)) / closing price (previous period)
3. Case: IC analysis of single factor one day
If the factor IC value of January 3, 2017 is calculated
3.1 analysis
If the factor IC value of January 3, 2017 is calculated (correlation coefficient with income)
1. Factor exposure value on January 3, 2017
2. Stock return on January 4, 2017 (closing price No. 4 - closing price No. 3) / closing price No. 3
3. Calculate correlation coefficient
3.2 code
import scipy.stats as st # 1. Factor exposure value on January 3, 2017 # Get all stock codes (A shares) stocks = all_instruments('CS').order_book_id # Acquisition factor fund = get_factor(stocks, factor=['basic_earnings_per_share'], start_date='20170103', end_date='20170103') # Delete date index fund = fund.reset_index(1, drop=True) # Delete nan data fund = fund.dropna() # fund = fund.sort_index(axis=0) # Cross section data is obtained stocks = fund.index.values # stock_list = fund.index.values print(len(stocks)) print(fund['basic_earnings_per_share'][:10])) print(stocks, len(stocks))
# 2. Stock return on January 4, 2017 (closing price No. 4 - closing price No. 3) / closing price No. 3 # Each stock has a price return calculation print(len(stocks)) price_now = get_price(stocks, start_date='20170103', end_date='20170103', fields='close') price_next = get_price(stocks, start_date='20170104', end_date='20170104', fields='close') price_now = price_now.reset_index(1, drop=True) price_next = price_next.reset_index(1, drop=True) # Gets and sets the mask for the collection def get_mask(and_list, stock_list): masks = [] for stock in stock_list: if stock in and_list: masks.append(True) else: masks.append(False) return masks # Print the length of the current closing price and the closing price factor data of the next period print(len(price_now), len(price_next), len(fund['basic_earnings_per_share'])) # Get stock code list next_stock_list = price_next.index.values now_stock_list = price_now.index.values # print(next_stock_list) # Intersection of current and next stock codes next_stock_set = set(next_stock_list) now_stock_set = set(now_stock_list) and_stock_list = list(next_stock_set & now_stock_set) print('Code intersection:', len(and_stock_list)) # Filter current closing price masks = get_mask(and_stock_list, now_stock_list) price_now = price_now[masks] # Filter the closing price of the next period masks = get_mask(and_stock_list, next_stock_list) price_next = price_next[masks] # Reprint data length print(len(price_now), len(price_next)) # Re filter fund = fund['basic_earnings_per_share'] masks = get_mask(and_stock_list, stocks) fund = fund[masks] print(len(fund)) # 2. Calculate the rate of return on January 4 stock_rice = (price_next.iloc[:, 0] - price_now.iloc[:, 0]) / price_now.iloc[:, 0] print(stock_rice[:10]) print(len(stock_rice), len(fund))
# 3. Calculate correlation coefficient print(stock_rice[:10]) print(fund[:10]) st.spearmanr(fund, stock_rice)