Line Following Sbot Using Reinforcement Learning
Our goal is to make Sbot learn to follow line. There are some algorithms to do that, but I'll focus on Reinforcement learning algorithm.
Sbot has two sensors, from which we can obtain an information about intensity of light. In our environment line is black while background is white, so we can convert these information from sensors into two states - "on line" and "out of line". These are inputs, which RL algorithm will use to decide, which action from set of actions it has to make. Set of actions consists of these three actions: moreLeft, moreRight and balance. So there are 12 different rules that can be constructed. We have 4 different conditions (2 inputs, each can be in two states), so there are 81 (3^4) different policies, from which we will try to find the best one. Each of 12 rules will have assigned a weight (or quality) and RL algorithm will make some corrections of these weights, based on reward, which Sbot get from previous action. In the end of training, the best rules will be those with the highest weights.
Before programming, we have to choose the way, how Sbot will get a reward. We have these options: 1. Place some photocells on the track along the line to monitor the progress of the robot: Each time robot reaches the next photocell, it will receive positive reinforcement signal. 2. Mount overhead camera about line, place some color shape on top of the robot, so its position can be recognize by camera easily and setup a sequence of reward locations. When the robot reaches these positions, it receive positive reinforcement signal. 3. Without any others devices, the robot can utilize its own sensors to determine if it is following line. We can measure the velocity of both wheels and see if the bottom light sensors can see bottom line frequently. We can experiment with reward function. We are inspired by the work of Chavas et al [1], where they use the following fitness function for trainig the robot to avoid obstacles:
<math>
f = \sum_t (0.5+\frac{v_l(t) + v_r(t)} {4 \cdot V_{max}}) \cdot (1-\frac{|v_l(t) - v_r(t)|} {2 \cdot V_{max}}) \cdot (1 - \frac{\sum_{front} p_i(t)} {4 \cdot P_{max}})
</math>
<math>
f = \sum_t (k_1+\frac{v_l(t) + v_r(t)} {4 \cdot v_m}) \cdot (k_2-\frac{|v_l(t) - v_r(t)|} {2 \cdot v_m}) \cdot (\frac{})
</math>
References
[1] Chavas J. et al (1999)Incremental Evolution of Neural Controllers for Robust Obstacle-Avoidance in Khepera. First European Workshop on Evolutionary Robotics, Paris 1998