UR5 assembly simulation and comparison based on DDPG and TD3

Posted by android6011 on Fri, 08 Oct 2021 04:23:45 +0200

This project is a major assignment of the course "optimal control" taught by Le Xinyi in the fall semester of Shanghai Jiaotong University in 2020. Most of the content is based on the work of senior Fang Xiaomeng, "Research on multi manipulator cooperative control technology based on neural network algorithm". Recently, due to the opening of the topic, I reviewed the intensive learning again. Here I share what I learned in the course

Project introduction

In this project, the UR5 manipulator is used in the simulation environment to insert the shaft at the end of the robot into the fixed hole, as shown in the following figure. Different from the direct assembly in the original paper, this scheme adopts two improvement measures:
1) The initial position is changed to near the orifice (plus random disturbance), which can be realized by visual servo in engineering, and the technology is relatively mature
2) Change the jack scheme from "straight beating the Yellow Dragon" to "looking for things"

Combined with the problems existing in the original paper, the following improvement ideas are put forward

Problems in the original paper	Improvement ideas
Training time is too long (4 days and 4 nights)	Shorten step and episode
The training quality is poor and the success rate is low	Change the initial position so that it can obtain as much effective data as possible
Low learning efficiency	Reduce replay_buffer pool depth

system configuration

Coppelia Sim (V-REP simulation environment)

The environment used in this project is Ubuntu 16.04, and the version used is CoppeliaSim V4.2.0 rev5. For different systems, please select the corresponding version according to the options on the official website, CoppeliaSim Download After downloading, CoppeliaSim_Edu_V4_2_0_Ubuntu16_04 folder is the policy environment folder.

At coppeliasim_ Edu_ V4_ 2_ 0_ Ubuntu16_ Open the terminal under file 04 and enter. / coppeliaSim.sh.

 way@way:~/CoppeliaSim_Edu_V4_2_0_Ubuntu16_04$ ./coppeliaSim.sh

Creating a virtual environment using conda

tensorflow 1.14.0, 1.x and GPU Version (CUDA, cudnn) are used in this project. The graphics card is a GeForce GTX 1070/PCIe/SSE2. Here, Anaconda3 is recommended to create a virtual environment.
First, import the path of anaconda3. It is recommended to use the abbreviation in. bashrc and the corresponding environment in alias for easy operation:

alias condaenv="export PATH="/home/way/anaconda3/bin:$PATH""
alias condaact='source activate'
# To activate this environment, use
alias tf='conda activate tf114py36'
alias pytorch='conda activate pytorch'

Here tf is the environment we use. The creation method of conda is:
Create a new environment conda create -n tf114py36 python=3.6.0
Install tensorflow 1.14.0: CONDA install - n tf114py36 tensorflow GPU = = 1.14.0
Delete existing environment conda remove -n xxx
Switch environment conda activate xxx
Exit environment conda deactivate
View the installed package conda list

If all are installed, the steps to enter the environment are very simple. Three steps are OK: condaenv, condaact, tf

Program running

First run V-REP and open scene → \to → ur5_ force_ vision_ Insert-v0.ttt, import the hole axis model, and the size of the hole axis is shown in the figure below

FILE NAME	SHAPE	SIZE
peg0	circular	ϕ 20 × 30 \phi 20 \times 30 ϕ20×30
hole0	circular	ϕ 22 × 30 \phi22\times30 ϕ22×30
peg1	circular	ϕ 20 × 30 \phi 20 \times 30 ϕ20×30
hole1	circular	ϕ 23 × 30 \phi23\times30 ϕ23×30
peg2	square	20 × 20 × 30 20\times20\times30 20×20×30
hole2	square	23 × 23 × 30 23\times23\times30 23×23×30

Run the learning function in the tf environment. Take td3 as an example. python main_TD3.py

from vrepEnv import ArmEnv
import numpy as np
import os
from logger import Logger
import time
import datetime
import torch
import utils
import TD3



MAX_EPISODES = 10000
MAX_EP_STEPS = 10
ON_TRAIN = False //True if learning status
MEMORY_CAPACITY = 1000
BATCH_SIZE = 200
logging_directory = os.path.abspath('logs')
# logger = Logger(logging_directory)

# Set environment
env = ArmEnv()
state_dim = env.state_dim
action_dim = env.action_dim
a_bound = env.action_bound
seed = 0
torch.manual_seed(seed)
np.random.seed(seed)
pn = a_bound*0.2
nc = a_bound*0.5


# Set method: pd/RL

rl = TD3.TD3(state_dim, action_dim, a_bound,policy_noise=pn,noise_clip=nc)
replay_buffer = utils.ReplayBuffer(state_dim, action_dim,MEMORY_CAPACITY)

def train():
    """ Training the actor """

    logger = Logger(logging_directory)
    env.start_simulation()

    action_var = a_bound  # Action noise factor
    reward_value = []  # Record the reward value for each episode

    for episode in range(MAX_EPISODES):

        s = env.reset()
        episode_reward = 0.  # Total reward for an episode

        for step in range(MAX_EP_STEPS):
            print('step: %i '%(step))
            pn = action_var * 0.2
            nc = action_var * 0.5
            a = (
                    rl.select_action(np.array(s))
                    + np.random.normal(0, pn, size=action_dim)
            ).clip(-action_var, action_var)
            print('Action: ', a)

            #Perform action
            s_, r, done, safety = env.step(a)
            # Store data in replay buffer s, a, s_, r, done->not_done
            replay_buffer.add(s, a, s_, r, done)

            s = s_
            episode_reward += r
            mean_episode_reward = episode_reward/(step+1)

            if episode>MEMORY_CAPACITY:
                print('the memory is full')
                rl.train(replay_buffer,BATCH_SIZE)  # Start to learn once has fulfilled the memory
                action_var *= .999  # Reduce the action randomness once start to learn

            if done or step == MAX_EP_STEPS - 1 or safety == 1:
                print('Episode: %i | %s | %s | MER: %f | Steps: %i | Var: %f'
                      % (episode, 'Not inserted' if not done else 'Inserted', 'Unsafe' if safety == 1 else 'Safe',
                         mean_episode_reward, step, action_var))
                break

        reward_value.append(mean_episode_reward)

    logger.save_reward_value(reward_value)
    rl.save(logger.models_directory)

def test():
    models_directory = '/home/way/py/RL_UR_Project/UR5_RL_Assembly-inserted_discrete/logs/2020-12-21.22:51:59/models'
    rl.load(models_directory)
    s = env.reset()
    while True:
        a = rl.select_action(s)
        s, r, done, _ = env.step(a)
        if done and np.abs(s[12]) <= 10 and np.abs(s[13]) <= 10 and np.abs(s[14]) <= 10:
            env.move_to(np.array([s[0],s[1],0.011]),np.array([0, 0, 0]))
            time.sleep(3)
            break



if __name__ == '__main__':

    timestamp = time.time()
    timestamp_value = datetime.datetime.fromtimestamp(timestamp)
    print("The program starts at:  ", timestamp_value.strftime('%Y-%m-%d.%H:%M:%S'))

    if ON_TRAIN:
        train()
    else:
        test()

Comparison of experimental results

DDPG simulation effect

Effect picture of successful training

Effect drawing of training failure

Periodic oscillation occurs and it is difficult to converge

DDPG simulation results

Assembly time less than one minute, maximum assembly force less than 14 n, success rate: 85%

TD3 simulation effect

Effect picture of successful training

Training failure diagram

Final result divergence

TD3 simulation results

The assembly time is less than 30 seconds, the maximum assembly force is less than 20 N, and the variance is small, the success rate is 60%

TD3 vs DDPG

project	TD3	DDPG
Success rate	60%	85%
Failure performance	Exponential divergence	Periodic oscillation is difficult to converge
Maximum assembly force	20 N	14 N
Average assembly time	15 s	30 s
Variance of assembly force	Small	large
Study time	one night	one night

Topics: TensorFlow Deep Learning

Programmer Think

UR5 assembly simulation and comparison based on DDPG and TD3

Project introduction

system configuration

Coppelia Sim (V-REP simulation environment)

Creating a virtual environment using conda

Program running

Comparison of experimental results

DDPG simulation effect

Effect picture of successful training

Effect drawing of training failure

DDPG simulation results

TD3 simulation effect

Effect picture of successful training

Training failure diagram

TD3 simulation results

TD3 vs DDPG

Hot Topics