preface:
When it comes to BackTrader's back test speed optimization, the most common saying is to replace it with numpy and other computing libraries from the bottom, but this optimization is undoubtedly very novice and unfriendly. Therefore, this paper focuses on how to simply optimize the slow back test in the case of multiple strands. Considering the test efficiency, this paper uses 100 stocks for back test. After testing, the execution speed of the optimized strategy is increased by 38% (62 - > 38.4).
Policy Description:
Stocks that did not rise by one word the day before entered the candidate pool.
Buy at 10 ~ 11 o'clock the next day if the increase is greater than 4%.
If the position shares are not sold by the daily limit at 14:30.
V1 strategy and running time:
v1 code design idea:
Use 5-point data for trading, and use daily line data for candidate pool judgment and increase judgment. Add a timer to filter the candidate pool only at 15:00 every day, and then judge whether to buy or sell according to the time and increase in the next. The code of the policy part is as follows:
class MyStrategy(bt.Strategy): params = dict( when=bt.timer.SESSION_START, end=bt.timer.SESSION_END, timer=True, cheat=False, offset=timedelta(), repeat=timedelta(), weekdays=[], period=3, ) def log(self, txt, dt=None): ''' Logging function fot this strategy''' dt = dt or self.datas[0].datetime.datetime(0) print('%s, %s' % (dt.isoformat(), txt)) def __init__(self): self.order = None self.add_timer( when=time(15, 0), offset=self.p.offset, repeat=self.p.repeat, weekdays=self.p.weekdays, ) s_m = [] for i, d in enumerate(self.datas): if not d._name.endswith('_day'): s_m.append([d._name, i, None]) self.st_df = pd.DataFrame(data=s_m, columns=['code', 'min', 'day']) for i, d in enumerate(self.datas): if d._name.endswith('_day'): n = d._name.split('_')[0] self.st_df.loc[self.st_df.code == n, 'day'] = i # self.stock_names.append(d._name) # self.min_stocks = self.datas[:int(len(self.datas)/2)] # self.day_stocks = self.datas[-int(len(self.datas)/2):] self.zt_list = [] self.last_hold = [] self.new_hold = [] self.zt_num = 0 def notify_order(self, order): if order.status in [order.Submitted, order.Accepted]: # Buy/Sell order submitted/accepted to/by broker - Nothing to do return # Check if an order has been completed # Attention: broker could reject order if not enough cash idx = self.st_df.loc[self.st_df.code==order.data._name].index.values[0] if order.status in [order.Completed]: if order.isbuy(): self.log( 'BUY EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' % (order.executed.price, order.executed.value, order.executed.comm)) self.new_hold.append(idx) self.zt_list.remove(idx) else: # Sell self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' % (order.executed.price, order.executed.value, order.executed.comm)) self.last_hold.remove(idx) elif order.status in [order.Canceled, order.Expired, order.Margin, order.Rejected]: self.log('Order Canceled/Expired/Margin/Rejected') self.new_hold.remove(idx) # Write down: no pending order self.order = None def next(self): t = self.datetime.time(0) #1. Buy from 10:05 to 11:05 every morning len_for_new = 10 - len(self.last_hold) - len(self.new_hold) if len(self.zt_list) > 0 and len_for_new > 0: if t >= time(9,40) and t <= time(14,30): for i in self.zt_list: if i in self.last_hold: continue d = self.datas[self.st_df.loc[i, 'min']] if len_for_new <= 0: break last_close = self.datas[self.st_df.loc[i, 'day']].close[0] if 1.045 * last_close < d.close[0] < 1.09 * last_close: len_for_new -= 1 targetvalue = 0.1 * self.broker.getvalue() size = targetvalue/(last_close*1.09)//100*100 self.buy(data=d, size=size, price=last_close*1.09, exectype=bt.Order.Limit, valid=self.datetime.datetime(0)+timedelta(minutes=5)) #2. Sell at 14:35 every day if len(self.last_hold) > 0: if t == time(14, 35): for i in self.last_hold: m = self.datas[self.st_df.loc[i, 'min']] d = self.datas[self.st_df.loc[i, 'day']] if m.close[0] < d.high_limit[0]: #14: At 30 o'clock, the latest day bar is yesterday's print('sell close a position', m._name, self.getposition(m).size) self.close(data=m) def notify_timer(self, timer, when, *args, **kwargs): # 2. Consolidated buying and selling results self.last_hold += self.new_hold self.new_hold = [] # 1. Pre select the stock pool according to the trading limit self.zt_list = [] for i, row in self.st_df.iterrows(): d = self.datas[row['day']] if d.close[0] > d.low[0] and d.pctChg[0] > 9.9: self.log('zhangting ' + str(d.close[0]) + d._name) self.zt_list.append(i) # 3. Delete purchased self.zt_list = list(set(self.zt_list)-set(self.last_hold)) self.zt_num += len(self.zt_list) #print('average daily limit ', self.zt_num/len(self.data0))
Run time:
Total time: 72 seconds
Read csv | cerebro.adddata | Execution complete |
---|---|---|
5.8 | 4 | 62 |
It can be seen that the time-consuming is mainly from the completion of adding data to the completion of execution of cerebro, and the optimized data reading method mentioned in [3] is not applicable. According to the proposal in [2], the time consumption of Observers and Analyzers can reach half of the execution time. After removal, the total time for re running is 71 seconds, which is not significantly improved. It may be that the Observers and Analyzers added in this example are relatively simple.
V2 strategy and running time:
v2 code improvement ideas:
In order to improve the operation efficiency, it is considered to minimize the judgment in the next, put it outside the cerebro, and attach the signal directly to the 5min data without transmitting the daily data. The code is as follows:
class PandasDataExtendInd(bt.feeds.PandasData): # Add line lines = ('ind','high_limit','buy_ind', 'sell_ind',) params = (('ind', -1),('high_limit', -1),('buy_ind', -1),('sell_ind', -1), ) # Total number of institutional holdings class MyStrategy(bt.Strategy): params = dict( when=bt.timer.SESSION_START, end=bt.timer.SESSION_END, timer=True, cheat=False, offset=timedelta(), repeat=timedelta(), weekdays=[], period=3, ) def log(self, txt, dt=None): ''' Logging function fot this strategy''' dt = dt or self.datas[0].datetime.datetime(0) print('%s, %s' % (dt.isoformat(), txt)) def __init__(self): self.order = None self.add_timer( when=time(15, 0), offset=self.p.offset, repeat=self.p.repeat, weekdays=self.p.weekdays, ) self.zt_list = [] self.last_hold = [] self.new_hold = [] self.zt_num = 0 def notify_order(self, order): if order.status in [order.Submitted, order.Accepted]: # Buy/Sell order submitted/accepted to/by broker - Nothing to do return # Check if an order has been completed # Attention: broker could reject order if not enough cash #idx = self.st_df.loc[self.st_df.code==order.data._name].index.values[0] if order.status in [order.Completed]: if order.isbuy(): self.log( 'BUY EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' % (order.executed.price, order.executed.value, order.executed.comm)) self.new_hold.append(order.data) self.zt_list.remove(self.datas.index(order.data)) else: # Sell self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' % (order.executed.price, order.executed.value, order.executed.comm)) self.last_hold.remove(order.data) elif order.status in [order.Canceled, order.Expired, order.Margin, order.Rejected]: self.log('Order Canceled/Expired/Margin/Rejected') self.new_hold.remove(order.data) # Write down: no pending order self.order = None def next(self): t = self.datetime.time(0) #1. Buy from 10:05 to 11:05 every morning len_for_new = 10 - len(self.last_hold) - len(self.new_hold) if len(self.zt_list) > 0 and len_for_new > 0: if t >= time(9,40) and t <= time(14,30): for i in self.zt_list: if i in self.last_hold: continue d = self.datas[i] if len_for_new <= 0: break if d.buy_ind: len_for_new -= 1 targetvalue = 0.1 * self.broker.getvalue() size = targetvalue/(d.high_limit*0.99)//100*100 self.buy(data=d, size=size, price=d.high_limit*0.99, exectype=bt.Order.Limit, valid=self.datetime.datetime(0)+timedelta(minutes=5)) #2. Sell at 14:35 every day if len(self.last_hold) > 0: if t == time(14, 35): for m in self.last_hold: if m.sell_ind: #14: At 30 o'clock, the latest day bar is yesterday's print('sell close a position', m._name, self.getposition(m).size) self.close(data=m) def notify_timer(self, timer, when, *args, **kwargs): # 2. Consolidated buying and selling results self.last_hold += self.new_hold self.new_hold = [] # 1. Pre select the stock pool according to the trading limit self.zt_list = [] for i, d in enumerate(self.datas): if d.ind[0]: self.zt_list.append(i) # 3. Delete purchased self.zt_list = list(set(self.zt_list)-set(self.last_hold)) self.zt_num += len(self.zt_list) #print('average daily limit ', self.zt_num/len(self.data0))
Run time:
Total time: 103 seconds
Read csv | cerebro.adddata | Execution complete |
---|---|---|
5.8 | 4.5 | 93 |
The effect of reverse optimization is significant, that is, the effect of comparison operation in next + less incoming daily data is far less than that of complex 5-minute data. Print the running time in detail. You can see that the first start of next is 80 seconds, and the middle time is nearly 70 seconds for cerebro to initialize.
V3 final optimization
Optimization ideas:
After analyzing the code in detail, we can get the most time-consuming part:
# cerebro.py -> runstrategies() for data in self.datas: data.preload() # feed.py -> preload() def preload(self): while self.load(): pass self._last() self.home()
preload itself is not easy to optimize, but runstrategies can be optimized by multi-threaded execution, using the multiprocessing method used by cerebro itself Pool complete.
Run time:
Total time: 49 seconds
Read csv | cerebro.adddata | Execution complete |
---|---|---|
5.9 | 4.3 | 38.4 |
The time-consuming of data reading and loading remains unchanged, and the execution speed is greatly improved.
conclusion
Using multithreading can greatly improve the speed of policy backtesting, and it is less difficult to modify.
Computer parameters:
I7-10510u, 2.30ghz, 4 cores, 8 threads
15G memory
win10
reference resources:
[1] https://zhuanlan.zhihu.com/p/345815425