Line Following Sbot Using Reinforcement Learning - Revision history

Robot at 13:47, 27 November 2009

2009-11-27T13:47:11Z

2009-11-26T12:21:20Z

2009-06-16T08:51:00Z

2009-06-16T08:45:56Z

2009-06-16T08:45:41Z

2009-06-16T08:43:33Z

2009-06-15T13:56:33Z

2009-06-15T13:49:29Z

‎Algorithm

2009-06-15T13:37:00Z

‎Algorithm

2009-06-15T13:32:05Z

‎Algorithm

← Older revision		Revision as of 13:47, 27 November 2009
Line 41:		Line 41:
	* [[Media:sbotrl2.3gp\|Same robot and program, follows line on a track different than it was trained on]]		* [[Media:sbotrl2.3gp\|Same robot and program, follows line on a track different than it was trained on]]

−	~~Zdrojový kód~~: [[{{ns:media}}:sbot.zip]]	+	Source code: [[{{ns:media}}:sbot.zip]]

← Older revision		Revision as of 12:21, 26 November 2009
Line 40:		Line 40:
	* [[Media:sbotrl1.3gp\|Robot follows line after it has been trained on this track using the RL algorithm described above]]		* [[Media:sbotrl1.3gp\|Robot follows line after it has been trained on this track using the RL algorithm described above]]
	* [[Media:sbotrl2.3gp\|Same robot and program, follows line on a track different than it was trained on]]		* [[Media:sbotrl2.3gp\|Same robot and program, follows line on a track different than it was trained on]]
		+
		+	Zdrojový kód: [[{{ns:media}}:sbot.zip]]

@@ Line 38: / Line 38: @@
 Video (from mobile phone = poor quality):
-* [[Media:sbotrl1.3pg|Robot follows line after it has been trained on this track using the RL algorithm described above]]
+* [[Media:sbotrl1.3gp|Robot follows line after it has been trained on this track using the RL algorithm described above]]
-* [[Media:sbotrl2.3pg|Same robot and program, follows line on a track different than it was trained on]]
+* [[Media:sbotrl2.3gp|Same robot and program, follows line on a track different than it was trained on]]

@@ Line 39: / Line 39: @@
 Video (from mobile phone = poor quality):
 * [[Media:sbotrl1.3pg|Robot follows line after it has been trained on this track using the RL algorithm described above]]
-* [[Media:sbotrl2.3pg]Same robot and program, follows line on a track different than it was trained on]]
+* [[Media:sbotrl2.3pg|Same robot and program, follows line on a track different than it was trained on]]

@@ Line 38: / Line 38: @@
 Video (from mobile phone = poor quality):
-* [[Media:sbotrl1.3pg]]
+* [[Media:sbotrl1.3pg|Robot follows line after it has been trained on this track using the RL algorithm described above]]
-* [[Media:sbotrl2.3pg]]
+* [[Media:sbotrl2.3pg]Same robot and program, follows line on a track different than it was trained on]]

← Older revision		Revision as of 08:43, 16 June 2009
Line 36:		Line 36:

	Random actions are necessary in learning process. But if we want from sbot to follow line, each random action can lead him out of the line. So we decide separate training and testing phase. And while in training phase robot didn't follow line all the way, in testing phase (after 20-30 trails) it will be able to follow line all the way.		Random actions are necessary in learning process. But if we want from sbot to follow line, each random action can lead him out of the line. So we decide separate training and testing phase. And while in training phase robot didn't follow line all the way, in testing phase (after 20-30 trails) it will be able to follow line all the way.
		+
		+	Video (from mobile phone = poor quality):
		+	* [[Media:sbotrl1.3pg]]
		+	* [[Media:sbotrl2.3pg]]

← Older revision		Revision as of 13:56, 15 June 2009
Line 34:		Line 34:

	Robot learns only if state is changed. This is important for correct learning process. Otherwise we can get different results from sensors in the same state, sbot is confused and learning is inefficient.		Robot learns only if state is changed. This is important for correct learning process. Otherwise we can get different results from sensors in the same state, sbot is confused and learning is inefficient.
		+
		+	Random actions are necessary in learning process. But if we want from sbot to follow line, each random action can lead him out of the line. So we decide separate training and testing phase. And while in training phase robot didn't follow line all the way, in testing phase (after 20-30 trails) it will be able to follow line all the way.

← Older revision		Revision as of 13:49, 15 June 2009
Line 28:		Line 28:

	<math>Q(s,a) = \max_a Q(s,a) \cdot \alpha</math>		<math>Q(s,a) = \max_a Q(s,a) \cdot \alpha</math>
		+
		+	We wait after each update 80 ms, so the sbot will produce cca. 10 actions per second.
		+
		+	Note that actions left and right stop one wheel while other is running. Result is, that robot moves forward while turn left or right, too. So we needn't compute reward with respect to robot's last actions.
		+
		+	Robot learns only if state is changed. This is important for correct learning process. Otherwise we can get different results from sensors in the same state, sbot is confused and learning is inefficient.

@@ Line 20: / Line 20: @@
 If probability p is greater than a random value between 0 and 1, random action will be chosen. Otherwise, we'll select a new action stochastically, from the rules with actual state in condition.
-After action is performed, we take measurement from middle sensor, compute reward and save it into the array of last w rewards (of course, the oldest reward is replaced). Then we have to update weights of rules. If reward was positive, weight will be updated by formula:
+After action is performed, we take measurement from middle sensor, compute reward and save it into the array of last w rewards (of course, the oldest reward is replaced). Then we have to update weights of rules. If reward was 1, weight will be updated by formula:
 <math>Q(s,a) = Q(s,a)\cdot(1-\alpha)\cdot \alpha</math>
-where <math>\alpha</math> is constant (in our program set to 0.2)
+where <math>\alpha</math> is constant (in our program set to 0.2).