Line Following Sbot Using Reinforcement Learning
Our goal is to make Sbot learn to follow line. There are some algorithms to do that, but we'll focus on Reinforcement learning algorithm.
Sbot has three sensors (we added one sensor for our purposes), from which we can obtain an information about intensity of reflected light. In our environment line is black while background is white, so we can reduce these information from sensors into two states - "on line" and "out of line". Measurements from two border sensors are inputs, which RL algorithm will use to decide, which action from set of actions it has to make and middle sensor is used to provide reinforcement. Set of actions consists of these three actions: moreLeft, moreRight and balance. So there are 12 different rules (if condition then action) that can be constructed. We have 4 different conditions (2 inputs, each can be in two states), so there are 81 (3^4) different policies, from which we will try to find the best one. Each of 12 rules will have assigned a weight (or quality) and RL algorithm will make some corrections of these weights, based on reward, which Sbot get from previous action. In the end of training, the best rules will be those with the highest weights.
Before programming, we have to choose the way, how Sbot will get a reward. We have these options:
- Place some photocells on the track along the line to monitor the progress of the robot: Each time robot reaches the next photocell, it will receive positive reinforcement signal.
- Mount overhead camera about line, place some color shape on top of the robot, so its position can be recognize by camera easily and setup a sequence of reward locations. When the robot reaches these positions, it receive positive reinforcement signal.
- Add third sensor between two mounted sensors and measurement from that middle sensor will serve as a binary reward - if sensor can see the line, robot get a reward, otherwise not.
The last option seems to be the simpliest one, so we will use it in our project.
Well-known problem in reinforcement learning is the exploration vs. exploitation trade-off. In our program, we select random action with probability <math>p</math> which is compute as follows: <math>p = p_{min} + \frac{\sum_{k=1}^w r_k}{r_{max}}</math> we will remember last <math>o</math> given rewards and