Deploy PyTorch model process on C + + platform + actual record of stepping on the pit

Posted by beboo002 on Thu, 10 Feb 2022 20:35:57 +0100

Reading guide

This paper mainly explains how to deploy the model of pytorch to the model process of C + + platform. It is divided into four blocks in order. It describes in detail the model transformation, saving the serialization model, loading the serialization pytorch model in C + + and executing the Script Module.

Recently, due to work needs, we need to deploy pytorch's model to the c + + platform. The basic process mainly refers to the teaching examples on the official website. During this period, many pits were found, which is hereby recorded.

1. Model transformation

Libtorch does not depend on python. The model trained in Python needs to be converted to script model before it can be loaded by libtorch and reasoned. In this step, the official website provides two methods:

Method 1: Tracing

This method is relatively simple. It only needs to give a group of inputs to the model, walk through the reasoning network, and then use torch ji. Trace record the information on the path and save it. Examples are as follows:

import torch
import torchvision

# An instance of your model.
model = torchvision.models.resnet18()

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)

The disadvantage is that if there is a control flow in the model, such as if else statements, a group of inputs can only traverse one branch. In this case, there is no way to record the model information completely.

Method 2: Scripting

Write the model directly in the Torch script and annotate the model accordingly through Torch jit. Script compilation module and convert it into ScriptModule. Examples are as follows:

class MyModule(torch.nn.Module):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.weight = torch.nn.Parameter(torch.rand(N, M))

    def forward(self, input):
        if input.sum() > 0:
          output =
          output = self.weight + input
        return output

my_module = MyModule(10,20)
sm = torch.jit.script(my_module)

The forward method will be compiled by default, and the methods called in forward will also be compiled in the order of being called

If you want to compile a method other than forward and not called by forward, you can add @ torch jit. export.

If you want the method not to be compiled, you can use


Or @ torch jit. unused( )

# Same behavior as pre-PyTorch 1.2
def some_fn():
    return 2

# Marks a function as ignored, if nothing
# ever calls it then this has no effect
def some_fn2():
    return 2

# As with ignore, if nothing calls it then it has no effect.
# If it is called in script it is replaced with an exception.
def some_fn3():
  import pdb; pdb.set_trace()
  return 4

# Doesn't do anything, this function is already
# the main entry point
def some_fn4():
    return 2

There are many pits in this step. The main reasons can be divided into the following two points

1. Unsupported operation

The operations supported by TorchScript are a subset of python. Corresponding implementations can be found for most operations used in torch, but there are some embarrassing operations that are not supported. See the detailed list, here are some actions I have encountered:

1) Variable number is not supported for parameter / return value, for example

def __init__(self, **kwargs):


if output_flag == 0:
    return reshape_logits
    loss = self.loss(reshape_logits, term_mask, labels_id)
    return reshape_logits, loss

2) Various iteration operations


layers = [int(a) for a in layers]

Error torch jit. frontend. UnsupportedNodeError: ListComp aren’t supported

It can be changed to:

for k in range(len(layers)):
    layers[k] = int(layers[k])


seq_iter = enumerate(scores)
    _, inivalues = seq_iter.__next__()
    _, inivalues =


line = next(infile)

3) Unsupported statement

eg1. continue is not supported

torch.jit.frontend.UnsupportedNodeError: continue statements aren't supported

eg2. Try catch is not supported

torch.jit.frontend.UnsupportedNodeError: try blocks aren't supported

eg3. The with statement is not supported

4) Other common op / modules

eg1. torch.autograd.Variable

Solution: use torch ones/torch. Randn et al float()/.long() and so on.

eg2. torch.Tensor/torch.LongTensor etc.

Solution: ibid

eg3. requires_ The grad parameter is only available in torch Supported in tensor, torch ones/torch. Zeros, etc. are not available

eg4. tensor.numpy()

eg5. tensor.bool()

Solution: tensor Bool() is replaced by tensor > 0

eg6. self.seg_emb(seg_fea_ids).to(embeds.device)

Solution: display and call where gpu needs to be transferred cuda()

In a word: libraries other than native Python and python, such as numpy, are not needed. Try to use various API s of Python.

2. Specify data type

1) Attribute, most member data types can be inferred from values, and empty lists / dictionaries need to be specified in advance

from typing import Dict

class MyModule(torch.nn.Module):
    my_dict: Dict[str, int]

    def __init__(self):
        super(MyModule, self).__init__()
        # This type cannot be inferred and must be specified
        self.my_dict = {}

        # The attribute type here is inferred to be `int`
        self.my_int = 20

    def forward(self):

m = torch.jit.script(MyModule())

2) Constant, using the Final keyword

    from typing_extensions import Final
    # If you don't have `typing_extensions` installed, you can use a
    # polyfill from `torch.jit`.
    from torch.jit import Final

class MyModule(torch.nn.Module):

    my_constant: Final[int]

    def __init__(self):
        super(MyModule, self).__init__()
        self.my_constant = 2

    def forward(self):

m = torch.jit.script(MyModule())

3) Variable. The default is the tensor type and is immutable, so the non tensor type must be specified

def forward(self, batch_size:int, seq_len:int, use_cuda:bool):

Method 3: mixing Tracing and Scriptin

One is to call script in the trace model, which is suitable for only a small part of the model requiring control flow.

import torch

def foo(x, y):
    if x.max() > y.max():
        r = x
        r = y
    return r

def bar(x, y, z):
    return foo(x, y) + z

traced_bar = torch.jit.trace(bar, (torch.rand(3), torch.rand(3), torch.rand(3)))

Another way is to use tracing to generate sub modules in script module. For some layers with Python features that script module does not support, you can encapsulate the relevant layers and record the relevant layer flow with trace, and other layers do not need to be modified. Examples of use are as follows:

import torch
import torchvision

class MyScriptModule(torch.nn.Module):
    def __init__(self):
        super(MyScriptModule, self).__init__()
        self.means = torch.nn.Parameter(torch.tensor([103.939, 116.779, 123.68])
                                        .resize_(1, 3, 1, 1))
        self.resnet = torch.jit.trace(torchvision.models.resnet18(),
                                      torch.rand(1, 3, 224, 224))

    def forward(self, input):
        return self.resnet(input - self.means)

my_script_module = torch.jit.script(MyScriptModule())

2. Save serialization model

If all the holes in the previous step are stepped on, the model saving is very simple. You only need to call save and pass a file name. It should be noted that if you want to train the model on gpu and do information on cpu, you must convert it before model save, and then remember to call model Eval (), shaped as

cpu_model = gpu_model.cpu()
sample_input_cpu = sample_input_gpu.cpu()
traced_cpu = torch.jit.trace(traced_cpu, sample_input_cpu), "cpu.pth")

traced_gpu = torch.jit.trace(traced_gpu, sample_input_gpu), "gpu.pth")

3.C++load trained model

To load a serialized pytorch model in C + +, you must rely on the PyTorch C ++ API (also known as libtorch). The installation of libtorch is very simple. You only need to install it on the pytorch official website( )Download the corresponding version and unzip it. You will get a folder with the following structure.


Then you can build the application. A simple example directory structure is as follows:


example-app.cpp and cmakelists Txt example codes are as follows:

#include <torch/script.h> // One-stop header.
#include <iostream>#include <memory>
int main(int argc, const char* argv[]) {
  if (argc != 2) {
    std::cerr << "usage: example-app <path-to-exported-script-module>\n";
    return -1;

  torch::jit::script::Module module;
  try {
    // Deserialize the ScriptModule from a file using torch::jit::load().
    module = torch::jit::load(argv[1]);
  catch (const c10::Error& e) {
    std::cerr << "error loading the model\n";
    return -1;

  std::cout << "ok\n";
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)

find_package(Torch REQUIRED)

add_executable(example-app example-app.cpp)
target_link_libraries(example-app "${TORCH_LIBRARIES}")
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)

At this point, you can run the following command from the - app / folder:

mkdir build
cd build
cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
cmake --build . --config Release

Where / path/to/libtorch is the path of the libtorch folder after downloading. If this step is successful, you can see the prompt that the compilation is 100% complete. Next, run the executable generated by the compilation, and you will see the output of "ok". Congratulations!

4. Execute Script Module

Finally to the last step! Next, you only need to pass the construction input to the model and execute forward to get the output. A simple example is as follows:

// Create a vector of inputs.
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1, 3, 224, 224}));

// Execute the model and turn its output into a tensor.
at::Tensor output = module.forward(inputs).toTensor();
std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';

The first two lines create a vector of torch::jit::IValue and add a single input Use torch::ones() to create the input tensor, which is equivalent to torch in C ++ API ones. Then, run the forward method of script::Module and convert the returned IValue value into tensor by calling toTensor(). C + + is relatively friendly to various operations of torch, through torch:: or post addition_ Corresponding implementations can be found for all methods, such as

torch::tensor(input_list[j]).to(at::kLong).resize_({batch, 128}).clone()
//torch::tensor corresponds to torch of pytorch tensor;  At:: klong corresponds to torch int64; resize_ Corresponding to resize

Finally, check to ensure that the output of c + + is consistent with pytorch, and you're done~

I stepped on countless pits and pulled away countless hair. Many things are groping by myself. If there are mistakes, please correct them!

reference material

PyTorch C++ API - PyTorch master document

Torch Script - PyTorch master documentation