mmdetection source note: interpretation of cascade_rcnn.py for creating network model (middle)

Posted by fleabay on Sun, 18 Aug 2019 11:30:36 +0200

Introduction:

The cascade_rcnn.py file is in the moels/detections folder. The code interpretation of the file cascade_rcnn.py is based on the data information of the PY configuration file configs/cascade_rcnn_r50_fpn_1x.py.

Moels/detection scade_rcnn.py file
The main contents are as follows:

  • _ init_(): The constructor of module.
  • init_weights(): backbone is the initialization weight method of cascade rcnn, which is initialized by _init_() call.
  • extract_feat(): extracting img features, mainly realizing forward() calculation of backbone and neck.
  • forward_train(): In this case, the connection between layers is called forward propagation. When model(x), which is a subclass of module, is executed, the bottom layer automatically calls forward method calculation results.
  • simple_test(): The function called by forward propagation of the detection process continues to be invoked from the bottom to the top through the most primitive nn.Module to the forward of the parent BaseDetector.
  • aug_test() : Test with augmentations.
  • show_result() :

There are seven parts in this paper. The code accuracy of the first five parts is mainly discussed. The _init_() and forward_train() are the two most important parts of module class and the most critical part of defining network.
Customizing a model is realized by inheriting nn.Module class, declaring the definition of each layer in the _init_() constructor, and realizing the connection between layers in forward() is actually the process of forward propagation.
Note: In the following three parts, the blogger will continue to read the code and explain.

First of all, when you look at this article, you will learn about the next article. This article explains the process of creating a model. In particular, taking detection as an example, mmdetection instantiates a Rigistry class named DETECTION in the form of a registry. In its module_dict attribute, it saves the module class of detection and its module class. The corresponding class name. Through this article, you can learn how mmdetection registers and creates models.

Next, learn about torch.nn.module (with pytorch foundation is OK, bloggers just started to look at mmdetection, there is no pytorch foundation, and then see forward() function, looking for several folders, see where he calls (). As we will see later, forward() is the forward calculation of the custom layer, which is automatically executed (i.e., the input is automatically processed). I recommend the following article:

__init__()

@DETECTORS.register_module 
#In build_from_cfg(), the detector is instantiated, and then the class and class names are fed into the method register_module by means of parameters.
class CascadeRCNN(BaseDetector, RPNTestMixin):
                                           # The parameters are from cascade_rcnn_r50_fpn_1x.py
    def __init__(self,
                 num_stages,               # 3
                 backbone,                 # ResNet
                 neck=None,                # FPN
                 shared_head=None,         
                 rpn_head=None,            # RPNHead
                 bbox_roi_extractor=None,  # SingleRoIExtractor
                 bbox_head=None,           # Shared FCBBox Head * 3 (three stages)
                 mask_roi_extractor=None,  
                 mask_head=None,
                 train_cfg=None,           # assigner : MaxIoUAssigner ;  sampler : RandomSampler 
                 test_cfg=None,            # skip
                 pretrained=None):         # modelzoo://resnet50
        assert bbox_roi_extractor is not None
        assert bbox_head is not None
        super(CascadeRCNN, self).__init__()

        self.num_stages = num_stages
        self.backbone = builder.build_backbone(backbone)  # build backbone and Registry
        
		#Similarly, create a module class model for each component (such as backbone, neck, bbox_head, etc.).
        if neck is not None:
            self.neck = builder.build_neck(neck)
        if rpn_head is not None:
            self.rpn_head = builder.build_head(rpn_head)
        if shared_head is not None:
            self.shared_head = builder.build_shared_head(shared_head)
        if bbox_head is not None:
            self.bbox_roi_extractor = nn.ModuleList()      
            #ModuleList() can be indexed like a list, [module 1, module 2, module 3...]
            #type='SingleRoIExtractor'  
            
            self.bbox_head = nn.ModuleList()
            #Shared FCBBox Head * 3; Three dictionaries form a list list, the type of the dictionary is the same, but the other fields in the dictionary are different.
            
            if not isinstance(bbox_roi_extractor, list):
                bbox_roi_extractor = [
                    bbox_roi_extractor for _ in range(num_stages)  
                    # cascade rcnn, 1 stage + 3 stage , 3 include 3 times detection
                ]
            if not isinstance(bbox_head, list): # bbox_head is list, so skip
                bbox_head = [bbox_head for _ in range(num_stages)]
            assert len(bbox_roi_extractor) == len(bbox_head) == self.num_stages
            
            for roi_extractor, head in zip(bbox_roi_extractor, bbox_head):
                self.bbox_roi_extractor.append(
                    builder.build_roi_extractor(roi_extractor))  # build bbox_roi_extractor
                self.bbox_head.append(builder.build_head(head))  # build bbox_head

        if mask_head is not None:   # The configuration file is cascade rcnn and does not cover the mask section, but masks are the same, and build s are all for the same purpose. 
            self.mask_head = nn.ModuleList()
            if not isinstance(mask_head, list):
                mask_head = [mask_head for _ in range(num_stages)]
            assert len(mask_head) == self.num_stages
            
            for head in mask_head:
                self.mask_head.append(builder.build_head(head)) # build mask_head
                
            if mask_roi_extractor is not None:                  # There is also no mask_roi_extractor - > None in the configuration file, so skip to the following else section.
            #This part of the classification is similar to the build() method in the build.py file. The essence of this part is the build model. It only deals with multiple dictionaries or single dictionaries separately.
                self.share_roi_extractor = False
                self.mask_roi_extractor = nn.ModuleList()
                if not isinstance(mask_roi_extractor, list):
                    mask_roi_extractor = [
                        mask_roi_extractor for _ in range(num_stages)
                    ]
                assert len(mask_roi_extractor) == self.num_stages
                for roi_extractor in mask_roi_extractor:
                    self.mask_roi_extractor.append(
                        builder.build_roi_extractor(roi_extractor)) # build mask_roi_extractor
            else:
                self.share_roi_extractor = True                     # share_roi_extractor = True
                self.mask_roi_extractor = self.bbox_roi_extractor   # mask_roi_extractor = bbox_roi_extractor 

        self.train_cfg = train_cfg                                  # train_cfg dictionary
        self.test_cfg = test_cfg                                    # test_cfg dictionary
        
        # In other words, the dictionary in the config configuration file is mapped to a module, and the data is saved to the properties of the module. These module classes are all subclasses of torch.nn.module.

        self.init_weights(pretrained=pretrained)                        # Initialize the detector's weights.

init_weights()

# Initialization Weight Procedure
    def init_weights(self, pretrained=None):                            # pretrained= modelzoo://resnet50
        super(CascadeRCNN, self).init_weights(pretrained)
        self.backbone.init_weights(pretrained=pretrained)               # backbone.init_weights()
        if self.with_neck:
            if isinstance(self.neck, nn.Sequential):                    # nn.Sequential  ?
                for m in self.neck:
                    m.init_weights()                                    # neck.init_weights()
            else:
                self.neck.init_weights()
        if self.with_rpn:                                               # true
            self.rpn_head.init_weights()                                # rpn_head.init_weights() 
        if self.with_shared_head:
            self.shared_head.init_weights(pretrained=pretrained)        # hared_head.init_weights()
        for i in range(self.num_stages):
            if self.with_bbox:
                self.bbox_roi_extractor[i].init_weights()
                self.bbox_head[i].init_weights()
            if self.with_mask:
                if not self.share_roi_extractor:
                    self.mask_roi_extractor[i].init_weights()
                self.mask_head[i].init_weights()

extract_feat()

 	def extract_feat(self, img):
        x = self.backbone(img)  # Feature extraction by forward calculation of backbone
        if self.with_neck:      #If there is neck feature processing, the extracted features are processed accordingly.
            x = self.neck(x)  

As we said above, when we instantiate a module class, the forward() method is automatically executed and the calculation results are computed.
So why can you call forward when you instantiate a class? When you instantiate a class, you call the _call_ method, and then call forward method in this method.

In Python, a special magic method allows instances of classes to behave like functions. You can call them, pass a function as a parameter to another function, and so on. This is a very powerful feature that makes Python programming more comfortable and sweet. _ Call (self, [args... )
Allow instances of a class to be called like functions. Essentially, this means that x() is the same as X. call (). Note that the _call_parameter is variable. This means that you can define _call_as any other function you want, no matter how many parameters you have.

forward_train()

In this py file, there is no forward() method.

On the internet, when inheriting nn.module, the forward() method must be implemented. I don't know why it hasn't been implemented here. I looked at its parent BaseDetector and found that forward() was implemented in the parent BaseDetector, so the subclass CascadeRCNN is the inherited BaseDetector, and the BaseDetector inherits nn.module, so it should be possible to implement forward() in the BaseDetector, so when calling CascadeRCNN, the parent class F is called. Orward () (the subclass does not override the forward() method), and forward_train (), which is an abstract method in the parent class BaseDetector, is called. Therefore, it can be understood that the function of forward_train() is the forward propagation calculation of CascadeRCNN.

Detection ideas:

General idea: input - > backbone - > neck - > head - > CLS and pred

In combination with the above ideas, we have a look at the implementation process of forward():

  • First, input the picture, then extract the feature. The function used here is extract_feat(); it contains two parts: backbone + neck, and calculates the forward backbone propagation and FPN. That is, self.backbone(img) and self.neck(x) are called.
  • Then you're going to extract the box, which is implemented with rpn_head (x). rpn_head(x) In models/anchor_head/rpn_head.py, the goal of RPN is to get candidate boxes, so here we also use another function get_bboxs() in anchor_head.py, which is a subclass of the latter in models/anchor_head/anchor_head.py.
  • After extracting the boxes, they are sent directly to the training. No, the last step of rpn output a bunch of candidate boxes, but these candidate boxes need to be divided into positive and negative samples before training. assigners do this. After dividing proposals into positive and negative samples, sampler is used to sample these proposals and sampler_result is used to train them. The main calls are bbox_assigner.assign() and bbox_sampler.sample().
  • Now that bbox has been processed, of course, those boxes can't be sent directly to bbox head. Before that, RoI Pooling will be done to map boxes of different sizes to fixed sizes. roi_layers use RoIAlign (the configuration file knows exactly what type of ROI processing is used), and the result of RoI can be sent to bbox head. The function called is bbox_roi_extractor().
  • The bbox head part is similar to the previous rpn part. It mainly classifies and modifies coordinates for each frame. Previously, rpn was divided into two categories: foreground and background. Here, rpn was divided into N+1 (actual category + background). The call is bbox_head.
  • The mask_head section is not included here, because it is mainly based on the configuration file configs/cascade_rcnn_r50_fpn_1x.py, but it is processed the same as the bbox head. (bbox_head output: bbox_cls + bbox_pred; mask_head output: mask_pred)
  • The most important thing is loss computing, which has been loss since the RPN stage.

Above is the general process of forward below, which involves a lot of function operations. Here, we will not go into details first, and then we will spend some time to read each part of the code in detail. Then revise the incorrect parts of this article. The code for forward_train() is as follows:

# Here, the connection between layers is actually called forward propagation (forward propagation calculation of training process).
    # The abstract method forward_train(), which implements the parent class, is called to execute in forward() of the parent class.
    def forward_train(self,
                      img,
                      img_meta,
                      gt_bboxes,
                      gt_labels,
                      gt_bboxes_ignore=None,
                      gt_masks=None,
                      proposals=None):
                      
        #Extracting features, including backbone + neck, calculates forward backbone propagation and FPN
        x = self.extract_feat(img)               # forward() executing extract_feat() 
        
        # loss has been around since RPN.
        #Start calculating loss, including rpn_loss, bbox_loss, mask_loss
        losses = dict()
        
        #rpn outputs a bunch of candidate boxes
        if self.with_rpn:
            rpn_outs = self.rpn_head(x)                         # x is the extracted feature. The feature is input to rpn_head(), processed and output bbox.
            
            # tuple can be added directly, which is equivalent to meta-combination and union.
            rpn_loss_inputs = rpn_outs + (gt_bboxes, img_meta,  #Input for calculating rpn_loss
                                          self.train_cfg.rpn)
            rpn_losses = self.rpn_head.loss(                    #rpn_head.loss() calculates loss 
                *rpn_loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
            losses.update(rpn_losses)                           # The Merging Method of Dictionaries

            proposal_cfg = self.train_cfg.get('rpn_proposal',   # proposal_cfg is a  dict.
                                              self.test_cfg.rpn)
                                              
            proposal_inputs = rpn_outs + (img_meta, proposal_cfg) #Input the box output from RPN and related parameter information into proposal
            proposal_list = self.rpn_head.get_bboxes(*proposal_inputs) #Get the Regression Candidate Box
        else:
            # Specify proposals directly
            proposal_list = proposals  

#In the previous step, rpn outputs a bunch of candidate boxes, but before training these candidate boxes, they need to be divided into positive and negative samples. assigners do this.

        for i in range(self.num_stages):    # num_stages = 3. Cascade RCNN 1 stage + 3 stage, three cycles
            self.current_stage = i                     # 3 stage rcnn for detect
            rcnn_train_cfg = self.train_cfg.rcnn[i]    # The parameters of rcnn are different in different stage s
            lw = self.train_cfg.stage_loss_weights[i]  # stage_loss_weights=[1, 0.5, 0.25])  


            # assign gts and sample proposals are divided into positive and negative samples, and the candidate boxes assign () and sample () are sampled. 
            sampling_results = []                      
            if self.with_bbox or self.with_mask:       # if include bbox or mask  -> true
                bbox_assigner = build_assigner(rcnn_train_cfg.assigner)  # build assigner -> MaxIoUAssigner
                bbox_sampler = build_sampler(                            # build_sampler  -> RandomSampler
                    rcnn_train_cfg.sampler, context=self)
                    
                num_imgs = img.size(0)                 # img.size(0) estimates the number of pictures. 
                if gt_bboxes_ignore is None:
                    gt_bboxes_ignore = [None for _ in range(num_imgs)]  # Generate num_imgs none values

            # start assign  and  sample   (file in  max_iou_assigner.py and random_sampler.py)
                for j in range(num_imgs):
                    assign_result = bbox_assigner.assign(               #bbox_assigner.assign()
                        proposal_list[j], gt_bboxes[j], gt_bboxes_ignore[j],
                        gt_labels[j])
                    #Sample positive and negative bboxes.
                    sampling_result = bbox_sampler.sample(              #bbox_sampler.sample()  
                        assign_result,
                        proposal_list[j],
                        gt_bboxes[j],
                        gt_labels[j],
                        feats=[lvl_feat[j][None] for lvl_feat in x])
                    sampling_results.append(sampling_result) #sample results ( list of proposals bbox )

            # ROI_pooling process
            # bbox head forward and loss     
            bbox_roi_extractor = self.bbox_roi_extractor[i]  # i stage  bbox_roi_extractor
            bbox_head = self.bbox_head[i]

            rois = bbox2roi([res.bboxes for res in sampling_results]) 
            # deal with proposals bbox to roi        *** bbox2roi() how to work ?***
            
            
            bbox_feats = bbox_roi_extractor(x[:bbox_roi_extractor.num_inputs],  # Features of X extract_feature extraction
                                            rois)
            if self.with_shared_head:                         #false
                bbox_feats = self.shared_head(bbox_feats)
                
            cls_score, bbox_pred = bbox_head(bbox_feats)      #bbox_head() processing, classification score and box prediction pred

            bbox_targets = bbox_head.get_target(sampling_results, gt_bboxes,
                                                gt_labels, rcnn_train_cfg) #Get the gt box??
                                                
            loss_bbox = bbox_head.loss(cls_score, bbox_pred, *bbox_targets) #Computing bbox_loss
            for name, value in loss_bbox.items():
                losses['s{}.{}'.format(i, name)] = (
                    value * lw if 'loss' in name else value)   #lw(loss_weight)=[1, 0.5, 0.25]

#Similarly, the mask part is the same as the bbox part, except that the parameters are different. Also, ROI_pooling and head--> mask_pred (also have mask_loss)
            # mask head forward and loss   
            if self.with_mask:
                if not self.share_roi_extractor:               # share_roi_extractor -> None ->  = True
                    mask_roi_extractor = self.mask_roi_extractor[i]
                    pos_rois = bbox2roi(                       # bbox2roi(res.pos_bboxes)
                        [res.pos_bboxes for res in sampling_results])# Posive sample in sampling_results?
                        
                    mask_feats = mask_roi_extractor(
                        x[:mask_roi_extractor.num_inputs], pos_rois)
                    if self.with_shared_head:
                        mask_feats = self.shared_head(mask_feats)
                else:
                    # reuse positive bbox feats
                    pos_inds = []
                    device = bbox_feats.device            # ????
                    for res in sampling_results:
                        pos_inds.append(
                            torch.ones(                   # torch.ones() returns a tensor of all 1
                                res.pos_bboxes.shape[0],  # pos_bboxes.shape[0] defines the output shape
                                device=device,
                                dtype=torch.uint8))
                        pos_inds.append(
                            torch.zeros(                  # zeros
                                res.neg_bboxes.shape[0],  # neg_bboxes.shape[0] defines the output shape
                                device=device,
                                dtype=torch.uint8))
                    pos_inds = torch.cat(pos_inds)        # Connection operation
                    mask_feats = bbox_feats[pos_inds]     # At this point, the value on the object in the bbox is 1, and the non-object area (background) is 0.
                                                          # In this way, a mask region is generated?
                                                          
                mask_head = self.mask_head[i]
                mask_pred = mask_head(mask_feats)         # mask_head() for prediction - > pred
                mask_targets = mask_head.get_target(sampling_results, gt_masks,
                                                    rcnn_train_cfg)
                pos_labels = torch.cat(
                    [res.pos_gt_labels for res in sampling_results])
                loss_mask = mask_head.loss(mask_pred, mask_targets, pos_labels)
                for name, value in loss_mask.items():
                    losses['s{}.{}'.format(i, name)] = (
                        value * lw if 'loss' in name else value)

            # refine bboxes
            if i < self.num_stages - 1: # num_stages = 3 , so when stage = 1 
                pos_is_gts = [res.pos_is_gt for res in sampling_results]
                roi_labels = bbox_targets[0]  # bbox_targets is a tuple
                with torch.no_grad():         # There is no need to calculate the gradient and no backpropagation.
                    proposal_list = bbox_head.refine_bboxes(       # Refine_bboxes() function??? (Further detailed interpretation)
                        rois, roi_labels, bbox_pred, pos_is_gts, img_meta)
        # End of for loop
        return losses                # forward() end

There are also three functions, which are not explained here, and bloggers will refine them when they see the content later. The content of this article is the blogger's notes made according to his own understanding after reading the mmdetection code. If there are any mistakes, please also point out that we should learn from each other and make progress together.

Next, see the following article of the blogger:

  • mmdetection source note (2): code interpretation of forward() of each module in the process of cascade_rcnn.py modeling (2) (to be completed)
  • mmdetection source note (3): interpretation of datasets/builder.py for creating data set models (to be completed)
  • mmdetection source note (4): interpretation of train_detector() of training model (to be completed)
  • mmdetection source note (5): interpretation of the test() part of the test (to be completed)

Topics: network Python Attribute Programming