direct use of particle filters for decision making

direct use of particle filters
for decision making
Ryuichi Ueda, Chiba inst. of Technology
パーティクルフィルタを推定だけでなく行動決定に直接使う試み
千葉工業大学上田隆一
日本知能情報ファジィ学会（SOFT）ベンチャー研究会
第1回「動きの様相から先を読む」研究会
@名古屋工業大学

metacognition －Flavell 1979
• knowledge or cognition about cognitive phenomena
– evaluation of the extent of its own knowledge
• how to implement to robots
– probability (Bayes') theory
– implementation
• methods of probabilistic robotics, methods of machine learning,
artificial neural networks, ...
Aug. 1, 2017 第1回「動きの様相から先を読む」研究会 2

probability expression of knowledge
• state variables: 𝒙 = 𝑥1, 𝑥2, … 𝑥 𝑛
– 𝑛 = 3 : mobile robot self-localization
– 𝑛 = 10 𝑁: SLAM (mapping)
– The actual 𝒙 is unknown.
• 𝑏𝑒𝑙 𝒙 : the belief of the robot about 𝒙
– a probability density function
𝑥
𝑦
heavy-tailed distribution
(unconfident)
𝑥
𝑦
peaky distribution
(confident)
𝑥
𝑦
peaky but some peaks

particle filters
• a popular method for self-localization
– Monte Carlo localization: particle filter for self-localization
• used for all of the methods in this presentation
• representation of the belief
• updates of the particles
– Sensor information reduces the distribution of the particles.
– Robot motion invokes motion of the particles.
by courtesy of Ryoma Aoki (Ueda lab.)
particles
(candidates
of the pose)
the actual pose
(unknown)

an example –Tsukuba challenge
decide its action
based on the most
reliable particle
• 2km run of autonomous robots
– a standard method for win
• put a LIDAR on their robot
• make a map with a SLAM method beforehand
• probabilistic self-localization with the map
• motion planning with non-probabilistic methods
Hayashibara laboratory's team of Chiba Inst. of Technology in 2017 (completes 2km run.)

severer cases
• RoboCup
– small camera
– vibration and collision
– few landmarks
• a micromouse in the maze
– only four range sensors
– perceptual aliasing
 Robots must decides their motion based on uncertain 𝑏𝑒𝑙s.

decision with broad beliefs
• Is it possible?
– easy for human beings
• "You sense that you do not yet know a certain chapter in your text
well enough to pass tomorrow's exam, so you read it through once
more." [Flavell 1979]
• Intelligent robots in the real world must be able to ...
– find an action that is effective even if the belief is broad
– find an action to reduce the uncertainty
• 2 (+ 1) cases are presented from our study.

CASE 1: REAL-TIME QMDP
R. Ueda, T. Arai, K. Sakamoto, Y. Jitsukawa, K. Umeda, H. Osumi, T. Kikuchi
and M. Komura: "Real-time decision making with state-value function
under uncertainty of state estimation – Evaluation with Local Maxima and
Discontinuity," IEEE ICRA, 2005.

a navigation problem
with multiple destinations
• the problem:
– There are more than one destinations.
– The robot knows its uncertainty of self-localization (𝑏𝑒𝑙).
– The robot must decide an effective action.
destination 1
destination 2
Which is easy to go?

a goalie task
for RoboCup 4 legged robot league
• three kinds of "destinations (sub-task)"
a) staying in the goal (ball: invisible)
b) punching the ball (ball: near to the goal)
c) closing a goalpost (ball: at a side of the goal)
ERS-210
x
y
(r,j)
(x,y,q)
goalie
goal
(a)
(b)
(c)

difference of accuracy requirement
• requirement to reach the sub-tasks
– (a) accurate self-localization only relative to the goal
– (b) no need of accurate self-localization
– (c) accurate self-localization
goal
(a)
(b)
(c)
𝑏𝑒𝑙 is broad but
the relative pose toward
the goal is accurate.
goal
 The robot must choose its sub-task with the
consideration of accuracy requirement in real-time.

real-time QMDP
• QMDP value method: written in [Littman 95]
• composed of offline and online calculation
– offline
• calculate the value (cost to go) function
without consideration of uncertain
• state variables: 𝒙 = 𝑥, 𝑦, 𝜃, 𝑥ball, 𝑦ball
– online
1. place all particles on the value function
2. choose an action that maximizes
the average value of the particles
state space (5D)
b c
value
a

ball
goal
calculated value function
• 3,000,000 discrete states in 5D state space
• 49[min] calculation with a 3.6[GHz] CPU
a part of the value function
(values on 𝑥𝑦-plane with a fixed (𝜃, 𝑥ball, 𝑦ball) )

motion with real-time QMDP
• Motion correspond to
the sub-tasks can be seen.
– real-time calculation of1000 particles
with 192MHz CPU
– Detailed evaluation can be seen
in [Ueda ICRA2005]
x2
waiting in the goal closing a goal post punching the ball
https://youtu.be/fsQicKXE5AU

CASE 2:
PROBABILISTIC FLOW CONTROL
Ryuichi Ueda: Generation of Compensation Behavior of Autonomous
Robot for Uncertainty of Information with Probabilistic Flow Control,
Advanced Robotics, 29(11), pp. 721-734, June, 2015.

motivation
• When I go to bed at midnight without light,
how I behave?
– I search a wall by my hand, and trace the wall.
• symbolical study: "coastal navigation" [Roy 99]
– planning with uncertainty evaluation at offline calculation
A degree of freedom is erased.
goal
possibility of lost
(x,y,q)
wall
H

how to realize the behavior
with real-time QMDP
• problems:
– no strategy for obtaining information
• no consideration of uncertainty at offline
• no consideration of future observation at online
– deadlocks
• The robot stops when any motion cannot improve the
average value.
• not fatal in RoboCup but fatal in navigation
 A small modification gives an interesting behavior.

probabilistic flow control (PFC)
• an additional assumption
– The robot can know whether it reached on a goal or not
through a sensor.
• modification of calculation
– The average value is
weighted by the value.
– Particles near a goal
have a priority.
value
state space
high weighted
low weighted

a navigation problem with one landmark
• state variables: 𝒙 = 𝑥, 𝑦, 𝜃
• information:
– landmark observation
– goal or not
• Particles do not converge
most of time.
destination
landmark
Poses of these particles never
contradict the sensor data.

application of PFC
• The robot moves as it is
dragged by some particles
near the goal.
• real-time QMDP
– 73 deadlocks in 100 trials

other applications
• Some unpublished modifications are applied to.
– (I must write but ...)
searching behavior of
a manipulator with (modified) PFC
a rod
(position
unknown)
red color:
likelihood of the
rod existence
wandering behavior of a raspberry
pi mouse with (modified) PFC
goal
these movies: https://blog.ueda.tech/?page_id=10034

CASE 3: PARTICLE FILTER ON EPISODE
• Ryuichi Ueda, Kotaro Mizuta, Hiroshi Yamakawa and Hiroyuki Okada:
Particle Filter on Episode for Learning Decision Making Rule, Proc. of
The 14th International Conference on Intelligent Autonomous
Systems (IAS-14), Shanghai, July, 2016.
• The 35-th Annual Conference of the RSJ (to appear)

motivation
• decision making before/without environmental maps
• Memory goes before, and a map follow after.
– Hippocampus of mammals generate a sequence of memory,
and the sequence becomes maps with dropout of time sequence
– information: memory > map
• Robots can store seemingly unlimited memory.
– different from creatures
no need of SLAM for intelligent decision making (?)

particle filter on episode (PFoE)
• procedure
1. record I/O and reward
2. calculate the similarity between the current situation and
each past state
3. choose an action that maximizes future reward
time axis
states (sensors)
episode
rewards
belief
s s s s s s s
present time
1 -1
a a a a a a a actions
past current
particles

a simple application
• The robot goes from the bottom of the T shape maze
to the goal that is set alternately on one of the arm.
• conditions (very simplified)
– rewards:
• 1: the robot turned to the goal arm
• -1: it turned to the wrong arm
– only four states:
start, T-junction, after turn,
end of an arm
– only one chance of decision:
right or left at T-junction

another version of PFoE
• used for teaching
• presented in the 35-th
Annual Conference of the RSJ
– currently secret
these movies on the web :
https://blog.ueda.tech/?page_id=10021

conclusion of this presentation
• Real-time QMDP can choose appropriate locations of a
goalkeeper in accordance with the belief of the robot.
– on a 192MHz CPU, 32MB DRAM
– It was actually used in RoboCup competitions for some years.
• PFC compensates loss of information by motion of robots.
– Robots with PFC show "searching behavior."
• We are trying to build a cognitive/metacognitive model
for robots with poor computing resources.

direct use of particle filters for decision making

Recommended

Recommended

More Related Content

Similar to direct use of particle filters for decision making

Similar to direct use of particle filters for decision making (20)

More from Ryuichi Ueda

More from Ryuichi Ueda (20)

Recently uploaded

Recently uploaded (20)

direct use of particle filters for decision making