Use server to train and test YOLOv3

Server usage

Remote Connection

Use XShell to establish SSH connection with server, configure username and password

Environment Configuration

Create your own python environment and clone the base environment directly for convenience

conda create --name yzh-env --clone base

Switch to your environment and install jupyter notebook

conda activate yzh-env

conda install jupyter notebook

Start jupyter notebook

jupyter notebook

Since the server does not have a browser, you need to map the server port to a local port and access it on your own computer, using XShell's tunneling capabilities

Modify the default working path to open jupyter notebook

jupyter notebook --generate-config

Then open /.jupyter/ and find the following text

`## The directory to use for notebooks and kernels.

#c.NotebookApp.notebook_dir = ''

Modify to

`The directory to use for notebooks and kernels.

c.NotebookApp.notebook_dir = '/data3/yanpengxiang/shixun'

Install pytorch

## The -c pytroch suffix of no official website is required, if it is added, it specifies to download from official sources, not from domestic sources

conda install pytorch torchvision cudatoolkit=9.0

Running effect

Tool use

**Tmux ** can be used to run multiple terminal sessions in a terminal window.In addition, Tmux allows terminal sessions to run in the background or access or disconnect sessions on demand, which is a very useful feature.

Create a session by default, named as a number


Create a new session named "ccc"

tmux new -s ccc

View all sessions created

tmux ls

Log on to a known session

tmux a -t aaa

Exit session without closing

ctrl+b d

Exit and close the session


Close Session

tmux kill-session -t bbb

YOLO Network

Download an open source implementation on github, and the author is training on the COCO dataset, so download the coco dataset

git clone

bash yolov3/data/




python3 --cfg cfg/yolov3.cfg --weights weights/yolov3.weights

Test mAP --weights weights/yolov3.weights

Transfer Learning

Use darknet/scripts/ to convert the format of VOC dataset to the format needed to train darknet, generate label for each picture, save in / labels folder, and picture in / JPEGImages folder

  • Each picture will have a label file (not needed if there are no objects in the picture)

  • Each row represents an object

  • Each row is formatted as class x_center y_center width height

  • Box coordinates must be standardized between 0-1

  • Class index starts at 0

Create a file, data/VOC/voc.names, that writes 20 categories of VOC to a file, one class per line

Create a file, data/VOC/, specify the number of categories, the location of training and test sets, etc.

Modify the network structure and create a new yolov3-voc.cfg.Because there are 255 outputs per YOLO layer (85 outputs per candidate box [4 box coordinates + 1 object confidence + 80 class confidences], there are three candidate boxes).VOC has only 20 classes, so change the number of filter s in the previous YOLO layer to 75 and classes in the YOLO layer to 20.Observing the network structure output, only three YOLO layer gradients are active.

Download the official pre-training weight weights/ to start migration training

python3 --transfer --cfg cfg/yolov3-voc.cfg --data data/VOC/

final result


python3 --weights weights/ --cfg cfg/yolov3-voc.cfg --data-cfg data/VOC/

code analysis

network structure

yolov3.cfg defines the network structure of YOLOv3, which consists of several block s.There are five types of layer s in YOLO.

The first block type is Net, which is not a layer but simply describes the input dimensions of the network and the values of some hyperparameters.

# Testing
# Training
saturation = 1.5
exposure = 1.5

max_batches = 500200

The second type is convolutional, which defines some parameters of the convolution layer, the number of filter s, size, step size, padding, activation function


The third type is shortcut, which is a shortcut connection for ResNet, where from=-3 means that the output of this layer is to add up the output of the previous layer and the output of the third layer forward


The fourth type is upsample, the upper sampling layer, where stride represents the step size and bilinear up sampling is used


The fifth type is the route layer, and if the layer parameter is one, the output from that layer will be copied directly, and if two parameters, the output from both layers will be concatenate d in depth

layers = -4

layers = -1, 61

The last type is YOLO, which is the detection layer for the network, and anchors describe nine anchors, but only the anchors specified by the mask are used.Since Yolo only generates three box es per cell, but predicts at three different scales, there are three Yolo layers in total, each with a different anchor size.

mask = 3,4,5
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
ignore_thresh = .7
truth_thresh = 1

The cfg file is parsed in and combined into a pytorch model for yolov3.The overall structure is as follows: the input size is 416x416, the predicted three feature layer sizes are 52, 26, 13, and the output prediction result is 3(4+1+80)=255 3*(4+1+80)=255 3(4+1+80)=255.

Training code

The training code is in set the path of the training set, then create the model, set the learning rate and optimizer.

    # Initialize model
    model = Darknet(cfg, img_size).to(device)

    # Optimizer
    lr0 = 0.001  # initial learning rate
    optimizer = torch.optim.SGD(model.parameters(), lr=lr0, momentum=0.9, weight_decay=0.0005)
    # Set scheduler (reduce lr at epochs 218, 245, i.e. batches 400k, 450k)
    scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[218, 245], gamma=0.1, last_epoch=start_epoch - 1)

Load the pre-trained Imagenet model on ImageNet.YOLOv3 continues training using pre-trained model parameters (file name: darknet53.conv.74, size 76MB) on Imagenet.

cutoff = load_darknet_weights(model, weights + 'darknet53.conv.74')

Loading datasets

# Configure run
train_path = parse_data_cfg(data_cfg)['train']
# Dataset
dataset = LoadImagesAndLabels(train_path, img_size=img_size, augment=True)

In each iteration, make predictions, calculate loss, and reverse propagate

for i, (imgs, targets, _, _) in enumerate(dataloader):
	imgs =
    targets =
	# Run model
	pred = model(imgs)

	# Build targets
	target_list = build_targets(model, targets)

	# Compute loss
	loss, loss_dict = compute_loss(pred, target_list)

	# Compute gradient
Test Code

Test code in

for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc='Computing mAP')):
    targets =
    imgs =
    # Run model
    inf_out, train_out = model(imgs)  # inference and training outputs
    # Build targets
    target_list = build_targets(model, targets)

    # Compute loss
    loss_i, _ = compute_loss(train_out, target_list)
    loss += loss_i.item()
    # Run NMS
    output = non_max_suppression(inf_out, conf_thres=conf_thres, nms_thres=nms_thres)

Calculate ap for each class, average to get mAP

# Compute statistics
stats_np = [np.concatenate(x, 0) for x in list(zip(*stats))]
nt = np.bincount(stats_np[3].astype(np.int64), minlength=nc)  # number of targets per class
if len(stats_np):
    p, r, ap, f1, ap_class = ap_per_class(*stats_np)
    mp, mr, map, mf1 = p.mean(), r.mean(), ap.mean(), f1.mean()
Detection Code

Detection code loads a trained model in to detect.

    # Initialize model
    model = Darknet(cfg, img_size)

    # Load weights
    if weights.endswith('.pt'):  # pytorch format
        model.load_state_dict(torch.load(weights, map_location=device)['model'])
    else:  # darknet format
        _ = load_darknet_weights(model, weights)

For each picture, the input model is output, the category is displayed on the screen, and the detection box is displayed on the picture.

for i, (path, img, im0, vid_cap) in enumerate(dataloader):
    # Get detections
    img = torch.from_numpy(img).unsqueeze(0).to(device)
    pred, _ = model(img)
    detections = non_max_suppression(pred, conf_thres, nms_thres)[0]
    if detections is not None and len(detections) > 0:
        # Rescale boxes from 416 to true image size
        scale_coords(img_size, detections[:, :4], im0.shape).round()

        # Print results to screen
        for c in detections[:, -1].unique():
            n = (detections[:, -1] == c).sum()
            print('%g %ss' % (n, classes[int(c)]), end=', ')

        # Draw bounding boxes and labels of detections
        for *xyxy, conf, cls_conf, cls in detections:
            if save_txt:  # Write to file
                with open(save_path + '.txt', 'a') as file:
                    file.write(('%g ' * 6 + '\n') % (*xyxy, cls, conf))

            # Add bbox to the image
            label = '%s %.2f' % (classes[int(cls)], conf)
            plot_one_box(xyxy, im0, label=label, color=colors[int(cls)])

    print('Done. (%.3fs)' % (time.time() - t))

