BackTrader: multi strand strategy speed optimization for performance optimization

Posted by MickeyAsh on Wed, 29 Dec 2021 10:32:26 +0100

preface:

When it comes to BackTrader's back test speed optimization, the most common saying is to replace it with numpy and other computing libraries from the bottom, but this optimization is undoubtedly very novice and unfriendly. Therefore, this paper focuses on how to simply optimize the slow back test in the case of multiple strands. Considering the test efficiency, this paper uses 100 stocks for back test. After testing, the execution speed of the optimized strategy is increased by 38% (62 - > 38.4).

Policy Description:

Stocks that did not rise by one word the day before entered the candidate pool.

Buy at 10 ~ 11 o'clock the next day if the increase is greater than 4%.

If the position shares are not sold by the daily limit at 14:30.

V1 strategy and running time:

v1 code design idea:

Use 5-point data for trading, and use daily line data for candidate pool judgment and increase judgment. Add a timer to filter the candidate pool only at 15:00 every day, and then judge whether to buy or sell according to the time and increase in the next. The code of the policy part is as follows:

class MyStrategy(bt.Strategy):
    params = dict(
        when=bt.timer.SESSION_START,
        end=bt.timer.SESSION_END,
        timer=True,
        cheat=False,
        offset=timedelta(),
        repeat=timedelta(),
        weekdays=[],
        period=3,
    )

    def log(self, txt, dt=None):
        ''' Logging function fot this strategy'''
        dt = dt or self.datas[0].datetime.datetime(0)
        print('%s, %s' % (dt.isoformat(), txt))

    def __init__(self):
        self.order = None

        self.add_timer(
                when=time(15, 0),
                offset=self.p.offset,
                repeat=self.p.repeat,
                weekdays=self.p.weekdays,
        )

        s_m = []
        for i, d in enumerate(self.datas):
            if not d._name.endswith('_day'):
                s_m.append([d._name, i, None])
        self.st_df = pd.DataFrame(data=s_m, columns=['code', 'min', 'day'])
        for i, d in enumerate(self.datas):
            if d._name.endswith('_day'):
                n = d._name.split('_')[0]
                self.st_df.loc[self.st_df.code == n, 'day'] = i
        #         self.stock_names.append(d._name)
        # self.min_stocks = self.datas[:int(len(self.datas)/2)]
        # self.day_stocks = self.datas[-int(len(self.datas)/2):]

        self.zt_list = []
        self.last_hold = []
        self.new_hold = []
        self.zt_num = 0

    def notify_order(self, order):
        if order.status in [order.Submitted, order.Accepted]:
            # Buy/Sell order submitted/accepted to/by broker - Nothing to do
            return

        # Check if an order has been completed
        # Attention: broker could reject order if not enough cash
        idx = self.st_df.loc[self.st_df.code==order.data._name].index.values[0]
        if order.status in [order.Completed]:
            if order.isbuy():
                self.log(
                    'BUY EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
                    (order.executed.price,
                     order.executed.value,
                     order.executed.comm))

                self.new_hold.append(idx)
                self.zt_list.remove(idx)
            else:  # Sell
                self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
                         (order.executed.price,
                          order.executed.value,
                          order.executed.comm))
                self.last_hold.remove(idx)
        elif order.status in [order.Canceled, order.Expired, order.Margin, order.Rejected]:
            self.log('Order Canceled/Expired/Margin/Rejected')
            self.new_hold.remove(idx)

        # Write down: no pending order
        self.order = None

    def next(self):
        t = self.datetime.time(0)
        #1. Buy from 10:05 to 11:05 every morning
        len_for_new = 10 - len(self.last_hold) - len(self.new_hold)
        if len(self.zt_list) > 0 and len_for_new > 0:
            if t >= time(9,40) and t <= time(14,30):
                for i in self.zt_list:
                    if i in self.last_hold:
                        continue
                    d = self.datas[self.st_df.loc[i, 'min']]
                    if len_for_new <= 0:
                        break
                    last_close = self.datas[self.st_df.loc[i, 'day']].close[0]
                    if 1.045 * last_close < d.close[0] < 1.09 * last_close:
                        len_for_new -= 1
                        targetvalue = 0.1 * self.broker.getvalue()
                        size = targetvalue/(last_close*1.09)//100*100
                        self.buy(data=d, size=size, price=last_close*1.09, exectype=bt.Order.Limit,
                                 valid=self.datetime.datetime(0)+timedelta(minutes=5))

        #2. Sell at 14:35 every day
        if len(self.last_hold) > 0:
            if t == time(14, 35):
                for i in self.last_hold:
                    m = self.datas[self.st_df.loc[i, 'min']]
                    d = self.datas[self.st_df.loc[i, 'day']]
                    if m.close[0] < d.high_limit[0]: #14: At 30 o'clock, the latest day bar is yesterday's
                        print('sell close a position', m._name, self.getposition(m).size)
                        self.close(data=m)

    
    def notify_timer(self, timer, when, *args, **kwargs):
        # 2. Consolidated buying and selling results
        self.last_hold += self.new_hold
        self.new_hold = []
        # 1. Pre select the stock pool according to the trading limit
        self.zt_list = []
        for i, row in self.st_df.iterrows():
            d = self.datas[row['day']]
            if d.close[0] > d.low[0] and d.pctChg[0] > 9.9:
                self.log('zhangting ' + str(d.close[0]) + d._name)
                self.zt_list.append(i)
        # 3. Delete purchased
        self.zt_list = list(set(self.zt_list)-set(self.last_hold))
        self.zt_num += len(self.zt_list)
        #print('average daily limit ', self.zt_num/len(self.data0))

Run time:

Total time: 72 seconds

Read csvcerebro.adddataExecution complete
5.8462

It can be seen that the time-consuming is mainly from the completion of adding data to the completion of execution of cerebro, and the optimized data reading method mentioned in [3] is not applicable. According to the proposal in [2], the time consumption of Observers and Analyzers can reach half of the execution time. After removal, the total time for re running is 71 seconds, which is not significantly improved. It may be that the Observers and Analyzers added in this example are relatively simple.

V2 strategy and running time:

v2 code improvement ideas:

In order to improve the operation efficiency, it is considered to minimize the judgment in the next, put it outside the cerebro, and attach the signal directly to the 5min data without transmitting the daily data. The code is as follows:

class PandasDataExtendInd(bt.feeds.PandasData):
    # Add line
    lines = ('ind','high_limit','buy_ind', 'sell_ind',)
    params = (('ind', -1),('high_limit', -1),('buy_ind', -1),('sell_ind', -1), )  # Total number of institutional holdings


class MyStrategy(bt.Strategy):
    params = dict(
        when=bt.timer.SESSION_START,
        end=bt.timer.SESSION_END,
        timer=True,
        cheat=False,
        offset=timedelta(),
        repeat=timedelta(),
        weekdays=[],
        period=3,
    )

    def log(self, txt, dt=None):
        ''' Logging function fot this strategy'''
        dt = dt or self.datas[0].datetime.datetime(0)
        print('%s, %s' % (dt.isoformat(), txt))

    def __init__(self):
        self.order = None

        self.add_timer(
                when=time(15, 0),
                offset=self.p.offset,
                repeat=self.p.repeat,
                weekdays=self.p.weekdays,
        )

        self.zt_list = []
        self.last_hold = []
        self.new_hold = []
        self.zt_num = 0

    def notify_order(self, order):
        if order.status in [order.Submitted, order.Accepted]:
            # Buy/Sell order submitted/accepted to/by broker - Nothing to do
            return

        # Check if an order has been completed
        # Attention: broker could reject order if not enough cash
        #idx = self.st_df.loc[self.st_df.code==order.data._name].index.values[0]
        if order.status in [order.Completed]:
            if order.isbuy():
                self.log(
                    'BUY EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
                    (order.executed.price,
                     order.executed.value,
                     order.executed.comm))

                self.new_hold.append(order.data)
                self.zt_list.remove(self.datas.index(order.data))
            else:  # Sell
                self.log('SELL EXECUTED, Price: %.2f, Cost: %.2f, Comm %.2f' %
                         (order.executed.price,
                          order.executed.value,
                          order.executed.comm))
                self.last_hold.remove(order.data)
        elif order.status in [order.Canceled, order.Expired, order.Margin, order.Rejected]:
            self.log('Order Canceled/Expired/Margin/Rejected')
            self.new_hold.remove(order.data)

        # Write down: no pending order
        self.order = None

    def next(self):
        t = self.datetime.time(0)
        #1. Buy from 10:05 to 11:05 every morning
        len_for_new = 10 - len(self.last_hold) - len(self.new_hold)
        if len(self.zt_list) > 0 and len_for_new > 0:
            if t >= time(9,40) and t <= time(14,30):
                for i in self.zt_list:
                    if i in self.last_hold:
                        continue
                    d = self.datas[i]
                    if len_for_new <= 0:
                        break
                    if d.buy_ind:
                        len_for_new -= 1
                        targetvalue = 0.1 * self.broker.getvalue()
                        size = targetvalue/(d.high_limit*0.99)//100*100
                        self.buy(data=d, size=size, price=d.high_limit*0.99, exectype=bt.Order.Limit,
                                 valid=self.datetime.datetime(0)+timedelta(minutes=5))

        #2. Sell at 14:35 every day
        if len(self.last_hold) > 0:
            if t == time(14, 35):
                for m in self.last_hold:
                    if m.sell_ind: #14: At 30 o'clock, the latest day bar is yesterday's
                        print('sell close a position', m._name, self.getposition(m).size)
                        self.close(data=m)

    
    def notify_timer(self, timer, when, *args, **kwargs):
        # 2. Consolidated buying and selling results
        self.last_hold += self.new_hold
        self.new_hold = []
        # 1. Pre select the stock pool according to the trading limit
        self.zt_list = []
        for i, d in enumerate(self.datas):
            if d.ind[0]:
                self.zt_list.append(i)
        # 3. Delete purchased
        self.zt_list = list(set(self.zt_list)-set(self.last_hold))
        self.zt_num += len(self.zt_list)
        #print('average daily limit ', self.zt_num/len(self.data0))

Run time:

Total time: 103 seconds

Read csvcerebro.adddataExecution complete
5.84.593

The effect of reverse optimization is significant, that is, the effect of comparison operation in next + less incoming daily data is far less than that of complex 5-minute data. Print the running time in detail. You can see that the first start of next is 80 seconds, and the middle time is nearly 70 seconds for cerebro to initialize.

V3 final optimization

Optimization ideas:

After analyzing the code in detail, we can get the most time-consuming part:

# cerebro.py -> runstrategies()
for data in self.datas:
    data.preload()

# feed.py -> preload()
def preload(self):
    while self.load():
        pass
    self._last()
    self.home()

preload itself is not easy to optimize, but runstrategies can be optimized by multi-threaded execution, using the multiprocessing method used by cerebro itself Pool complete.

Run time:

Total time: 49 seconds

Read csvcerebro.adddataExecution complete
5.94.338.4

The time-consuming of data reading and loading remains unchanged, and the execution speed is greatly improved.

conclusion

Using multithreading can greatly improve the speed of policy backtesting, and it is less difficult to modify.

Computer parameters:

I7-10510u, 2.30ghz, 4 cores, 8 threads

15G memory

win10

reference resources:

[1] https://zhuanlan.zhihu.com/p/345815425

[2] https://www.zhihu.com/question/440467223

[3] https://community.backtrader.com/topic/2263/which-line-code-function-consume-more-time-when-doing-a-backtest/13

Topics: Python backtrader