Single factor effectiveness analysis of quantitative trading rice basket -- factor IC analysis

Posted by trehy2006 on Fri, 15 Oct 2021 01:31:55 +0200

Introduction to multi factor screening stage

1. Tasks in the filtering phase

  • Single factor validity analysis

    Selection of 500 factors = = > 100 factors, which really contribute to the yield

  • Multi factor correlation analysis

    Get the high correlation and delete some

  • Multifactor synthesis

With so many characteristics, how to find the stock return corresponding to the factor?

Relationship between factor and yield factor eigenvalue

  • Link between factor and yield factor (eigenvalue), yield (target value)

2. Process of mining factors

1. From hundreds of factors, the factors that are effective for the rate of return are analyzed

  • Filter in each category of factors, and filter out N effective factors in each category

    • Quality, valuation, growth and other factors
    • Strict: for example, 20 effective factors
    • Not strict: for example, there are 50 effective factors
  • Do correlation analysis among the selected single factors and combine the factors with strong correlation

    Finally, the effective and weak correlation factors are obtained, the number is small, generally about 10

Audition > n factors > selection = = > n factors

Objective of multivariate validity analysis

1. Several problems in validity analysis

  • IC analysis of factors

    Correlation strength between judgment factor and income

  • Yield analysis of factor

    Determine the stock direction of the factor

Direction of factor

  • Factor ascending order the smaller the factor value, the better, such as P / E ratio

  • Factor descending order: the larger the factor value, the better, such as profit

  • Factor neutral factor direction uncertainty, such as turnover

  • Get two tables to filter

Single factor validity analysis – factor IC analysis

Factor IC analysis determines the correlation between factor and yield.

  • IC mean: average value of factor IC
  • IC std: standard deviation
  • IC > 0.02: ratio with factor greater than 0.02
  • IR: information ratio
    • IR = IC mean / IC volatility (std)
    • IR > 0.3 for screening, the standard can be modified

1. Definition of information coefficient

Correlation between factor data and stock return

The IC of a period refers to the cross-sectional correlation coefficient between the factor exposure value of that period and the actual return value of the stock in the next period.

1.1 factor exposure value

It refers to the value of the factor itself

The cycle is one day: (this period) the factor exposure value of 20180103, (next period) 20180104: stock return

Cycle one week, one month (similar)

1.2 calculation method
  • Spearman correlation coefficient (Rank IC)
    • If X increases, Y tends to increase. The Spearman correlation coefficient is positive.
    • Value [- 1, 1]
1.3 information coefficient API
import scipy.stats as st
st.spearmanr(fund['pe_ratio'], fund['return']) # stock yield

2. How to find the rate of return

2.1 yield range
  • By interval size
    • Daily rate of return
    • Monthly rate of return
    • Annual rate of return
2.2 calculation formula

Yield of a period = (closing price - closing price (previous period)) / closing price (previous period)

3. Case: IC analysis of single factor one day

If the factor IC value of January 3, 2017 is calculated

3.1 analysis

If the factor IC value of January 3, 2017 is calculated (correlation coefficient with income)

1. Factor exposure value on January 3, 2017

2. Stock return on January 4, 2017 (closing price No. 4 - closing price No. 3) / closing price No. 3

3. Calculate correlation coefficient

3.2 code

Factor description document

import scipy.stats as st
# 1. Factor exposure value on January 3, 2017
# Get all stock codes (A shares)
stocks = all_instruments('CS').order_book_id

# Acquisition factor 
fund = get_factor(stocks, factor=['basic_earnings_per_share'], start_date='20170103', end_date='20170103')
# Delete date index
fund = fund.reset_index(1, drop=True)
# Delete nan data
fund = fund.dropna()
# fund = fund.sort_index(axis=0)
# Cross section data is obtained
stocks = fund.index.values
# stock_list = fund.index.values
print(stocks, len(stocks))
# 2. Stock return on January 4, 2017 (closing price No. 4 - closing price No. 3) / closing price No. 3
# Each stock has a price return calculation
price_now = get_price(stocks, start_date='20170103', end_date='20170103', fields='close')
price_next = get_price(stocks, start_date='20170104', end_date='20170104', fields='close')
price_now = price_now.reset_index(1, drop=True)
price_next = price_next.reset_index(1, drop=True)

# Gets and sets the mask for the collection
def get_mask(and_list, stock_list):
    masks = []
    for stock in stock_list:
        if stock in and_list:
    return masks

# Print the length of the current closing price and the closing price factor data of the next period
print(len(price_now), len(price_next), len(fund['basic_earnings_per_share']))

# Get stock code list
next_stock_list = price_next.index.values
now_stock_list = price_now.index.values
# print(next_stock_list)

# Intersection of current and next stock codes
next_stock_set = set(next_stock_list)
now_stock_set = set(now_stock_list)
and_stock_list = list(next_stock_set & now_stock_set)
print('Code intersection:', len(and_stock_list))

# Filter current closing price
masks = get_mask(and_stock_list, now_stock_list)
price_now = price_now[masks]

# Filter the closing price of the next period
masks = get_mask(and_stock_list, next_stock_list)
price_next = price_next[masks]

# Reprint data length
print(len(price_now), len(price_next))

# Re filter
fund = fund['basic_earnings_per_share']
masks = get_mask(and_stock_list, stocks)
fund = fund[masks]

# 2. Calculate the rate of return on January 4
stock_rice = (price_next.iloc[:, 0] - price_now.iloc[:, 0]) / price_now.iloc[:, 0]
print(len(stock_rice), len(fund))
# 3. Calculate correlation coefficient  
st.spearmanr(fund, stock_rice)