Strengthening learning and actual combat | mine clearance in custom Gym environment

Posted by George Botley on Thu, 27 Jan 2022 05:17:03 +0100

Python wechat ordering applet course video

https://edu.csdn.net/course/detail/36074

Python actual combat quantitative transaction financial management system

https://edu.csdn.net/course/detail/35475

Before starting

Consider a few questions first:

  • Q1: how to expand minefree areas?
  • Q2: how to calculate the prompt number of the grid?
  • Q3: how to represent the status of mine sweeping game?

A1: you can use recursive functions or stacks.

A2: the general practice is to count the number of mines around when a grid needs to be opened. If there is a convenient two-dimensional convolution function to call, this will be a more concise method:

[1001001001101000000001001]★[111101111]=[1221233331131212322110110]

\begin{bmatrix} 1 & 0 & 0 & 1 & 0\ 0 & 1 & 0 & 0 & 1\ 1 & 0 & 1 & 0 & 0\ 0 & 0 & 0 & 0 & 0\ 0 & 1 & 0 & 0 & 1 \end{bmatrix}\bigstar \begin{bmatrix} 1 & 1 & 1\ 1 & 0 & 1\ 1 & 1 & 1 \end{bmatrix}= \begin{bmatrix} 1 & 2 & 2 & 1 & 2\ 3 & 3 & 3 & 3 & 1\ 1 & 3 & 1 & 2 & 1\ 2 & 3 & 2 & 2 & 1\ 1 & 0 & 1 & 1 & 0 \end{bmatrix}
Use ★ \ bigstar to represent two-dimensional convolution operation. 5 to the left of the equal sign × 5 matrix shows the distribution of lightning, with a value of 1 indicating that there is lightning and a value of 0 indicating that there is no lightning; 3 to the left of the equal sign × 3 matrix is the convolution kernel (or filter, feature extractor) for solving the surrounding ray number; The matrix to the right of the equal sign is the surrounding ray number of all lattices.

The code is also very simple to implement:

from scipy import signal
import numpy as np
state\_mine = np.array([[1,0,0,1,0],[0,1,0,0,1],[1,0,1,0,0],[0,0,0,0,0],[0,1,0,0,1]])
KERNAL = np.array([[1,1,1],[1,0,1],[1,1,1]])
state\_num = signal.convolve2d(state\_mine, KERNAL, 'same')

A3: for players, the game state is not completely observed, that is, it is necessary to distinguish between the observation state and the environment state. The environmental state includes the lightning distribution matrix and the prompt number matrix (i.e. the one mentioned in the above formula); The observation state is the environment state partially visible to the player. It is necessary to partially shield the mine distribution matrix according to the open state of the grid. The observation state does not include the mine distribution matrix, because the game ends once it touches the mine, so all non terminating states in the game are mine free.

So for a size M × NM \times N minesweeping game, the environmental state can be expressed as M × N × Tensor of 2M \times N \times 2: Channel 1 is the thunder distribution matrix, and channel 2 is the prompt number matrix; The observation state can be expressed as # M × N × Tensor of 2M \times N \times 2: Channel 1 is the matrix indicating the grid open state (value 1 is open and value 0 is not open), and this matrix multiplies the element of the prompt number matrix to complete the partial shielding of the environmental state as the second channel. For numpy For array, element multiplication is easy:

observe\_num = state\_num * state\_open

The game state in the following figure is illustrated as an example:

The environmental status is:

[1111]×[1110010100112112220111211]

\begin{bmatrix}  &  &  &  & \  & 1 &  &  & \  &  &  &  & \  &  &  & 1 & \ 1 & 1 &  &  &  \end{bmatrix}\times \begin{bmatrix} 1 & 1 & 1 & 0 & 0\ 1 & 0 & 1 & 0 & 0\ 1 & 1 & 2 & 1 & 1\ 2 & 2 & 2 & 0 & 1\ 1 & 1 & 2 & 1 & 1 \end{bmatrix}
The observation status is:

[1010010100102112200110010]×[11111111111111111]

\begin{bmatrix} 1 & 0 & 1 & 0 & 0\ 1 & 0 & 1 & 0 & 0\ 1 & 0 & 2 & 1 & 1\ 2 & 2 & 0 & 0 & 1\ 1 & 0 & 0 & 1 & 0 \end{bmatrix}\times \begin{bmatrix} 1 &  & 1 & 1 & 1\ 1 &  & 1 & 1 & 1\ 1 &  & 1 & 1 & 1\ 1 & 1 &  &  & 1\ 1 &  &  & 1 &  \end{bmatrix}
But this representation is not unique. For example, we can divide the prompt number matrix into 9 channels to represent the prompt numbers of 0 ~ 8 respectively. Then the observation state becomes M × N × Tensor of 10M \times N \times 10:

[1111]×[111111111]×[111]×[]×⋯×[]×[11111111111111111]

\begin{bmatrix} & & & 1 & 1\ & & & 1 & 1\ & & & & \ & & & & \ & & & & \end{bmatrix}\times \begin{bmatrix} 1 & & 1 & & \ 1 & & 1 & & \ 1 & & & 1 & 1\ & & & & 1\ & & & 1 & \end{bmatrix}\times \begin{bmatrix} & & & & \ & & & & \ & & 1 & & \ 1 & 1 & & & \ & & & & \end{bmatrix}\times \begin{bmatrix} & & & & \ & & & & \ & & & & \ & & & & \ & & & & \end{bmatrix}\times \cdots \times \begin{bmatrix} & & & & \ & & & & \ & & & & \ & & & & \ & & & & \end{bmatrix}\times \begin{bmatrix} 1 & & 1 & 1 & 1\ 1 & & 1 & 1 & 1\ 1 & & 1 & 1 & 1\ 1 & 1 & & & 1\ 1 & & & 1 & \end{bmatrix}
The design of state space is flexible, and the only evaluation criterion is the performance of the complete learning system. If the above multi-channel state space design is adopted, the subsequent learning tasks can be carried out conveniently by using convolutional neural network. You can also expand the tensor matrix into a one-dimensional vector and then process it with a fully connected neural network. The subsequent implementation of this paper will adopt M × N × State space representation of 2M \times N \times 2.

Step 1: create a new file

In order to run pytorch, I created an environment management operation named pytorch 1. 0 using anaconda 1, and openAI gym is installed in this environment, so I came to the directory: D: \ Anaconda \ envs \ pytorch1 1 \ lib \ site packages \ gym \ envs \ user, create a new file__ init__.py and MineSweeper_env.py.

Step 2: write the file MineSweeper_env.py

A standard gym env class contains three methods: reset(), step(action), and render().

  • reset() is used to initialize the environment;
  • step(action) has four return values: state, reward, done, and info. Therefore, we need to complete all the logic of the minesweeping game in this function;
  • render() is used to visualize the environment. I didn't find the native method rendering of gym on the Internet, which can display text (if you know, please leave a message, thank you!), Therefore, a large number of characters are displayed by pyglet + dynamic variable name. The specific methods can be seen Reinforcement learning practice | display string of custom Gym environment.

MineSweeper_ env. The overall code of Py is as follows:

import gym
import random
import time
import numpy as np
from scipy import signal # Two dimensional convolution
import pyglet # display text
from gym.envs.classic\_control import rendering


class DrawText: # Used to display text in rendering
    def \_\_init\_\_(self, label:pyglet.text.Label):
 self.label=label
 def render(self):
 self.label.draw()


class MineSweeperEnv(gym.Env):
 def \_\_init\_\_(self):
 self.MINE\_NUM = 20
 self.ROW, self.COL = 12, 12
 self.SIZE = 40
 WIDTH = self.COL * self.SIZE
 HEIGHT = self.ROW * self.SIZE
 self.viewer = rendering.Viewer(WIDTH, HEIGHT)
 self.state\_mine = None
 self.state\_num = None
 self.state\_open = None
 self.gameOver = False
 
 
 def reset(self):
 # Initialization: mine status
        MINE\_NUM = self.MINE\_NUM
 self.state\_mine = np.zeros(self.ROW * self.COL) 
 self.state\_mine[:MINE\_NUM] = 1
 random.shuffle(self.state\_mine)
 self.state\_mine = self.state\_mine.reshape(self.ROW, self.COL)
 # Initialization: prompt number
        KERNAL = np.array([[1,1,1], [1,0,1], [1,1,1]])
 self.state\_num = signal.convolve2d(self.state\_mine, KERNAL, 'same')
 # Initialization: open state
        self.state\_open = np.zeros((self.ROW, self.COL))
 # Initialization: is the game over
        self.gameOver = False
 
 
 def getRoundSet(self, x, y):
 roundSet = []
 for i in range(x-1, x+2):
 for j in range(y-1, y+2):
 if 0 <= i < self.ROW and 0 <= j < self.COL and (i, j) != (x, y):
 roundSet.append((i, j))
 return roundSet
 
 
 def step(self, action):
 # Execute action
        x, y = action
 # If open, the number is not 0
        if self.state\_num[x, y] >= 1:
 self.state\_open[x, y] = 1
        # If the opening number is 0, the minefree area will be expanded
        if self.state\_num[x, y] == 0:
 stack = []
 stack.append((x, y))
 while len(stack):
 row, col = stack.pop()
 self.state\_open[row, col] = 1
                for one in self.getRoundSet(row, col):
 # Exclude open cells
                    if self.state\_open[one] == 1:
 continue
                    if self.state\_num[one] >= 1:
 self.state\_open[one] = 1
                    else:
 stack.append(one) 
 
 # Win or lose / get rewards
        done, reward = False, 0
 # If thunder is turned on, the game fails
        if self.state\_mine[x, y] == 1:
 self.state\_open[x, y] = 1
 self.gameOver = True
 done, reward = True, -1
        # If the number of remaining unopened squares = thunder number, you will win
        if ROW*COL - self.state\_open.sum() == self.MINE\_NUM:
 self.gameOver = True
 done, reward = True, 1
        
        # Report (maintain the standard format of gym step)
        info = {}
 # Observation state
        observe\_num = self.state\_num * self.state\_open
 observe = [observe\_num, self.state\_open]
 return observe, reward, done, info
 
 
 def render(self, mode='human'):
 ROW, COL, SIZE = self.ROW, self.COL, self.SIZE
 # Draw a square
        for i in range(ROW):
 for j in range(COL):
 X, Y = j*SIZE, (ROW-i-1)*SIZE
 tile = rendering.make\_polygon([(X,Y), (X+SIZE,Y), (X+SIZE,Y+SIZE), (X,Y+SIZE)], filled=True)
 if self.state\_open[i,j] == 0:
 tile.set\_color(106/255,116/255,166/255)
 if self.state\_open[i,j] == 1 and self.state\_mine[i,j] == 0:
 tile.set\_color(255/255,242/255,204/255)
 if self.state\_open[i,j] == 1 and self.state\_mine[i,j] == 1:
 tile.set\_color(220/255,20/255,60/255)
 self.viewer.add\_geom(tile)
 # Draw a dividing line
        WIDTH = COL*SIZE
 HEIGHT = ROW*SIZE
 for i in range(ROW+1):
 line = rendering.Line((0, i*SIZE), (WIDTH, i*SIZE))
 line.set\_color(80/255, 80/255, 80/255)
 self.viewer.add\_geom(line)
 for j in range(COL+1):
 line = rendering.Line((j*SIZE, 0), (j*SIZE, HEIGHT))
 line.set\_color(80/255, 80/255, 80/255)
 self.viewer.add\_geom(line)
 # Draw numbers
        for i in range(ROW):
 for j in range(COL):
 exec('label\_{}\_{} = {}'.format(i, j, None))
 names = locals()
 NUM = int(self.state\_num[i,j])
 COLOR = (255, 255, 255, 255)
 if NUM == 1:
 COLOR = (46, 117, 182, 255)
 elif NUM == 2:
 COLOR = (84, 130, 53, 255)
 elif NUM == 3:
 COLOR = (192, 0, 0, 255)
 elif NUM == 4:
 COLOR = (112, 48, 160, 255)
 elif NUM == 5:
 COLOR = (132, 60, 12, 255)
 elif NUM == 6:
 COLOR = (191, 144, 0, 255)
 elif NUM == 7:
 COLOR = (32, 56, 100, 255)
 elif NUM == 8:
 COLOR = (13, 13, 13, 255)
 names['label\_' + str(i) + '\_' + str(j)] = pyglet.text.Label('{}'.format(NUM), font\_size=15,
 x=(j+0.32)*SIZE, y=(ROW-i-1+0.23)*SIZE, anchor\_x='left', anchor\_y='bottom',
 color=COLOR)
 label = names['label\_{}\_{}'.format(i, j)]
 label.draw()
 if self.state\_mine[i,j] == 0 and self.state\_open[i,j] == 1 and self.state\_num[i,j] >= 1:
 self.viewer.add\_geom(DrawText(label))
 # Draw thunder
                if self.gameOver == True:
 if self.state\_mine[i,j] == 1:
 mine = rendering.make\_circle(10, 6, filled=True)
 mine.set\_color(30/255, 30/255, 30/255)
 translation = rendering.Transform(translation=((j+0.5)*SIZE, (ROW-i-1+0.5)*SIZE))
 mine.add\_attr(translation)
 self.viewer.add\_geom(mine)
 
 return self.viewer.render(return\_rgb\_array=mode == 'rgb\_array')
 

# Test code: perform actions with random strategy
if \_\_name\_\_ == '\_\_main\_\_': 
 MineSweeper = MineSweeperEnv()
 ROW, COL = MineSweeper.ROW, MineSweeper.COL
 MineSweeper.reset()
 MineSweeper.render()
 while MineSweeper.gameOver is not True:
 while True:
 rand = random.choice(range(ROW*COL))
 x, y = rand//ROW, rand%ROW
 if MineSweeper.state\_open[x, y] == 0:
 action = (x, y)
 break
 state, reward, done, info = MineSweeper.step(action)
 MineSweeper.render()
 time.sleep(0.5)

Run the file directly and execute the test code (execute the action with random strategy):

Step 3: write__ init__.py

In__ init__.py, add:

from gym.envs.user.MineSweeper\_env import MineSweeperEnv

Step 4: register environment

Go to the directory: D: \ Anaconda \ envs \ pytorch1 1 \ lib \ site packages \ gym, open__ init__.py, add code:

register(
 id="MineSweeperEnv-v0",
 entry\_point="gym.envs.user:MineSweeperEnv",
 max\_episode\_steps=200, 
)

Step 5: Test Environment

In the same conda environment, enter the code:

import gym
env = gym.make('MineSweeperEnv-v0')env.reset()env.render()

If no error is reported, the gym environment registration is successful.

Topics: Python Back-end computer