Difference between revisions of "Line Following Sbot Using Reinforcement Learning"

From RoboWiki
Jump to: navigation, search
Line 2: Line 2:
 
Our goal is to make Sbot learn to follow line. There are some algorithms to do that, but I'll focus on [http://en.wikipedia.org/wiki/Reinforcement_learning Reinforcement learning] algorithm.
 
Our goal is to make Sbot learn to follow line. There are some algorithms to do that, but I'll focus on [http://en.wikipedia.org/wiki/Reinforcement_learning Reinforcement learning] algorithm.
  
Sbot has two sensors, from which we can obtain an information about intensity of reflected light. In our environment line is black while background is white, so we can reduce these information from sensors into two states - "on line" and "out of line". These are inputs, which RL algorithm will use to decide, which action from set of actions it has to make. Set of actions consists of these three actions: moreLeft, moreRight and balance. So there are 12 different rules (if ''condition'' then ''action'') that can be constructed. We have 4 different conditions (2 inputs, each can be in two states), so there are 81 (3^4) different policies, from which we will try to find the best one. Each of 12 rules will have assigned a weight (or quality) and RL algorithm will make some corrections of these weights, based on reward, which Sbot get from previous action. In the end of training, the best rules will be those with the highest weights.
+
Sbot has three sensors, from which we can obtain an information about intensity of reflected light. In our environment line is black while background is white, so we can reduce these information from sensors into two states - "on line" and "out of line". Measurements from two sensors are inputs, which RL algorithm will use to decide, which action from set of actions it has to make and center sensor is used to provide reinforcement. Set of actions consists of these three actions: moreLeft, moreRight and balance. So there are 12 different rules (if ''condition'' then ''action'') that can be constructed. We have 4 different conditions (2 inputs, each can be in two states), so there are 81 (3^4) different policies, from which we will try to find the best one. Each of 12 rules will have assigned a weight (or quality) and RL algorithm will make some corrections of these weights, based on reward, which Sbot get from previous action. In the end of training, the best rules will be those with the highest weights.
  
 
Before programming, we have to choose the way, how Sbot will get a reward. We have these options:  
 
Before programming, we have to choose the way, how Sbot will get a reward. We have these options:  

Revision as of 11:55, 15 June 2009

Sbot following line

Our goal is to make Sbot learn to follow line. There are some algorithms to do that, but I'll focus on Reinforcement learning algorithm.

Sbot has three sensors, from which we can obtain an information about intensity of reflected light. In our environment line is black while background is white, so we can reduce these information from sensors into two states - "on line" and "out of line". Measurements from two sensors are inputs, which RL algorithm will use to decide, which action from set of actions it has to make and center sensor is used to provide reinforcement. Set of actions consists of these three actions: moreLeft, moreRight and balance. So there are 12 different rules (if condition then action) that can be constructed. We have 4 different conditions (2 inputs, each can be in two states), so there are 81 (3^4) different policies, from which we will try to find the best one. Each of 12 rules will have assigned a weight (or quality) and RL algorithm will make some corrections of these weights, based on reward, which Sbot get from previous action. In the end of training, the best rules will be those with the highest weights.

Before programming, we have to choose the way, how Sbot will get a reward. We have these options:

  1. Place some photocells on the track along the line to monitor the progress of the robot: Each time robot reaches the next photocell, it will receive positive reinforcement signal.
  2. Mount overhead camera about line, place some color shape on top of the robot, so its position can be recognize by camera easily and setup a sequence of reward locations. When the robot reaches these positions, it receive positive reinforcement signal.
  3. Without any others devices, the robot can utilize its own sensors to determine if it is following line. We can measure the velocity of both wheels and see if the bottom light sensors can see bottom line frequently. We can experiment with reward function. We are inspired by the work of Chavas et al [1], where they use the following fitness function for trainig the robot to avoid obstacles:
    <math>
 f = \sum_t (0.5+\frac{v_l(t) + v_r(t)} {4 \cdot V_{max}}) \cdot (1-\frac{|v_l(t) - v_r(t)|} {2 \cdot V_{max}}) \cdot (1 - \frac{\sum_{front} p_i(t)} {4 \cdot P_{max}})

</math>
Where <math>v_l(t)</math> and <math>v_r(t)</math> were the velocities of the left and right wheels, respectively; <math>V_{max}</math> was the maximum absolute velocity; <math>p_i(t)</math> was the proximity measure returned by each sensor <math>i</math> among the four front sensors; <math>P_{max}</math> was the largest measured value that can be returned.
For our task we'll use a modified version of this function:
<math>

 f = \sum_{t_1} (k_1+\frac{v_l(t_1) + v_r(t_1)} {4 \cdot v_m}) \cdot (k_2-\frac{|v_l(t_1) - v_r(t_1)|} {2 \cdot v_m}) \cdot (
 1 - k_3 \cdot \prod_{t_2} (1 - s_l(t_2)) \cdot ( 1 - s_r(t_2)))

</math>
Where <math>k_i</math> are some constants; <math>s_l(t)</math> and <math>s_r(t)</math> are measurements on left and right sensor in the time <math>t_2</math>


The last option seems to be the simpliest one, because of its independence on other devices, so we will use it in our project.

References

[1] Chavas J. et al (1999) Incremental Evolution of Neural Controllers for Robust Obstacle-Avoidance in Khepera. First European Workshop on Evolutionary Robotics, Paris 1998