SlideShare a Scribd company logo
1 of 61
Download to read offline
PREDICTION AND PLANNING FOR
SELF DRIVING @WAYMO (GOOGLE)
YU HUANG
SUNNYVALE, CALIFORNIA
YU.HUANG07@GMAIL.COM
References
• ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
• Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
• VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
• TNT: Target-driven Trajectory Prediction
• Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open
Motion Dataset
• Identifying Driver Interactions Via Conditional Behavior Prediction
• Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
• STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory
Prediction
CHAUFFEURNET: LEARNING TO DRIVE BY
IMITATING THE BEST SYNTHESIZING THE
WORST
• TRAIN A POLICY FOR AUTONOMOUS DRIVING VIA IMITATION LEARNING THAT IS ROBUST ENOUGH TO DRIVE A
REAL VEHICLE.
• STANDARD BEHAVIOR CLONING IS INSUFFICIENT FOR HANDLING COMPLEX DRIVING SCENARIOS, EVEN
LEVERAGING A PERCEPTION SYSTEM FOR PREPROCESSING THE INPUT AND A CONTROLLER FOR EXECUTING THE
OUTPUT ON THE CAR: 30 MILLION EXAMPLES ARE STILL NOT ENOUGH.
• EXPOSING THE LEARNER TO SYNTHESIZED DATA IN THE FORM OF PERTURBATIONS TO THE EXPERT’S DRIVING,
WHICH CREATES INTERESTING SITUATIONS SUCH AS COLLISIONS AND/OR GOING OFF THE ROAD.
• RATHER THAN PURELY IMITATING ALL DATA, AUGMENT THE IMITATION LOSS WITH ADDITIONAL LOSSES THAT
PENALIZE UNDESIRABLE EVENTS AND ENCOURAGE PROGRESS – THE PERTURBATIONS THEN PROVIDE AN
IMPORTANT SIGNAL FOR THESE LOSSES AND LEAD TO ROBUSTNESS OF THE LEARNED MODEL.
• CHAUFFEURNET MODEL CAN HANDLE COMPLEX SITUATIONS IN SIMULATION.
CHAUFFEURNET: LEARNING TO DRIVE BY
IMITATING THE BEST SYNTHESIZING THE
WORST
CHAUFFEURNET: LEARNING TO DRIVE BY
IMITATING THE BEST SYNTHESIZING THE
WORST
CHAUFFEURNET: LEARNING TO DRIVE BY
IMITATING THE BEST SYNTHESIZING THE
WORST
CHAUFFEURNET: LEARNING TO DRIVE BY
IMITATING THE BEST SYNTHESIZING THE
WORST
CHAUFFEURNET: LEARNING TO DRIVE BY
IMITATING THE BEST SYNTHESIZING THE
WORST
CHAUFFEURNET: LEARNING TO DRIVE BY
IMITATING THE BEST SYNTHESIZING THE
WORST
CHAUFFEURNET: LEARNING TO DRIVE BY
IMITATING THE BEST SYNTHESIZING THE
WORST
MULTIPATH: MULTIPLE PROBABILISTIC ANCHOR
TRAJECTORY HYPOTHESES FOR BEHAVIOR
PREDICTION
• PREDICTING HUMAN BEHAVIOR IS A DIFFICULT AND CRUCIAL TASK REQUIRED FOR MOTION PLANNING.
• IT IS CHALLENGING IN LARGE PART DUE TO THE HIGHLY UNCERTAIN AND MULTIMODAL SET OF POSSIBLE
OUTCOMES IN REAL-WORLD DOMAINS SUCH AS AUTONOMOUS DRIVING.
• BEYOND SINGLE MAP TRAJECTORY PREDICTION, OBTAINING AN ACCURATE PROBABILITY DISTRIBUTION OF THE
FUTURE IS AN AREA OF ACTIVE INTEREST.
• MULTIPATH LEVERAGES A FIXED SET OF FUTURE STATE-SEQUENCE ANCHORS THAT CORRESPOND TO MODES OF
THE TRAJECTORY DISTRIBUTION.
• AT INFERENCE, THE MODEL PREDICTS A DISCRETE DISTRIBUTION OVER THE ANCHORS AND, FOR EACH ANCHOR,
REGRESSES OFFSETS FROM ANCHOR WAYPOINTS ALONG WITH UNCERTAINTIES, YIELDING A GAUSSIAN
MIXTURE AT EACH TIME STEP.
• THE MODEL IS EFFICIENT, REQUIRING ONLY ONE FORWARD INFERENCE PASS TO OBTAIN MULTI-MODAL FUTURE
DISTRIBUTIONS, AND THE OUTPUT IS PARAMETRIC, ALLOWING COMPACT COMMUNICATION AND ANALYTICAL
PROBABILISTIC QUERIES.
MULTIPATH: MULTIPLE PROBABILISTIC ANCHOR
TRAJECTORY HYPOTHESES FOR BEHAVIOR
PREDICTION
MULTIPATH: MULTIPLE PROBABILISTIC ANCHOR
TRAJECTORY HYPOTHESES FOR BEHAVIOR
PREDICTION
• MULTIPATH ESTIMATES THE DISTRIBUTION OVER FUTURE TRAJECTORIES PER AGENT IN A SCENE, AS FOLLOWS:
• 1) BASED ON A TOP-DOWN SCENE REPRESENTATION, THE SCENE CNN EXTRACTS MID-LEVEL FEATURES THAT ENCODE
THE STATE OF INDIVIDUAL AGENTS AND THEIR INTERACTIONS.
• 2) FOR EACH AGENT IN THE SCENE, CROP AN AGENT-CENTRIC VIEW OF THE MID-LEVEL FEATURE REPRESENTATION AND
PREDICT THE PROBABILITIES OVER THE FIXED SET OF K PREDEFINED ANCHOR TRAJECTORIES.
• 3) FOR EACH ANCHOR, THE MODEL REGRESSES OFFSETS FROM THE ANCHOR STATES AND UNCERTAINTY DISTRIBUTIONS
FOR EACH FUTURE TIME STEP.
• THE DISTRIBUTION IS PARAMETERIZED BY ANCHOR TRAJECTORIES A; DIRECTLY LEARNING A MIXTURE SUFFERS FROM
ISSUES OF MODE COLLAPSE, AS IS COMMON PRACTICE IN OTHER DOMAINS SUCH AS OBJECT DETECTION AND HUMAN
POSE ESTIMATION, IT ESTIMATES THE ANCHORS A-PRIORI BEFORE FIXING THEM TO LEARN THE REST OF OUR
PARAMETERS; A PRACTICAL WAY IS THE K-MEANS ALGORITHM AS A SIMPLE APPROXIMATION TO OBTAIN A.
• IT TRAINS THE MODEL VIA IMITATION LEARNING BY FITTING PARAMETERS TO MAXIMIZE THE LOG-LIKELIHOOD OF
RECORDED DRIVING TRAJECTORIES.
MULTIPATH: MULTIPLE PROBABILISTIC ANCHOR
TRAJECTORY HYPOTHESES FOR BEHAVIOR
PREDICTION
• THEY STILL REPRESENT A HISTORY OF DYNAMIC AND STATIC SCENE CONTEXT AS A 3-
DIMENSIONAL ARRAY OF DATA RENDERED FROM A TOP-DOWN ORTHOGRAPHIC PERSPECTIVE.
• THE FIRST TWO DIMENSIONS REPRESENT SPATIAL LOCATIONS IN THE TOP-DOWN IMAGE.
• THE CHANNELS IN THE DEPTH DIMENSION HOLD STATIC AND TIME-VARYING (DYNAMIC)
CONTENT OF A FIXED NUMBER OF PREVIOUS TIME STEPS.
• AGENT OBSERVATIONS ARE RENDERED AS ORIENTATED BOUNDING BOX BINARY IMAGES, ONE
CHANNEL FOR EACH TIME STEP.
• OTHER DYNAMIC CONTEXT SUCH AS TRAFFIC LIGHT STATE AND STATIC CONTEXT OF THE ROAD
(LANE CONNECTIVITY AND TYPE, STOP LINES, SPEED LIMIT, ETC.) FORM ADDITIONAL CHANNELS.
• AN IMPORTANT BENEFIT OF USING SUCH A TOP-DOWN REPRESENTATION IS THE SIMPLICITY OF
REPRESENTING CONTEXTUAL INFORMATION LIKE THE AGENTS’ SPATIAL RELATIONSHIPS TO EACH
OTHER AND SEMANTIC ROAD INFORMATION.
MULTIPATH: MULTIPLE PROBABILISTIC ANCHOR
TRAJECTORY HYPOTHESES FOR BEHAVIOR
PREDICTION
Top: Logged trajectories of all agents are displayed in cyan. The focused agent is highlighted by a red
circle. Bottom: MultiPath showing up to 5 trajectories with uncertainty ellipses. Trajectory probabilities
(softmax outputs) are encoded in a color map shown to the right. MultiPath can predict uncertain future
trajectories for various speed (1st column), different intent at intersections (2nd and 3rd columns) and lane
changes (4th and 5th columns), where the regression baseline only predicts a single intent.
VECTORNET: ENCODING HD MAPS AND AGENT
DYNAMICS FROM VECTORIZED REPRESENTATION
• BEHAVIOR PREDICTION IN DYNAMIC, MULTI-AGENT SYSTEMS IS AN IMPORTANT PROBLEM IN THE CONTEXT OF SELF-
DRIVING CARS, DUE TO THE COMPLEX REPRESENTATIONS AND INTERACTIONS OF ROAD COMPONENTS, INCLUDING
MOVING AGENTS (E.G. PEDESTRIANS AND VEHICLES) AND ROAD CONTEXT INFORMATION (E.G. LANES, TRAFFIC LIGHTS).
• THIS PAPER INTRODUCES VECTORNET, A HIERARCHICAL GRAPH NEURAL NETWORK (GNN) THAT FIRST EXPLOITS THE
SPATIAL LOCALITY OF INDIVIDUAL ROAD COMPONENTS REPRESENTED BY VECTORS AND THEN MODELS THE HIGH-
ORDER INTERACTIONS AMONG ALL COMPONENTS.
• IN CONTRAST TO MOST RECENT APPROACHES, WHICH RENDER TRAJECTORIES OF MOVING AGENTS AND ROAD
CONTEXT INFORMATION AS BIRD-EYE IMAGES AND ENCODE THEM WITH CONVOLUTIONAL NEURAL NETWORKS
(CONVNETS), THIS APPROACH OPERATES ON A VECTOR REPRESENTATION.
• BY OPERATING ON THE VECTORIZED HIGH DEFINITION (HD) MAPS AND AGENT TRAJECTORIES, IT AVOIDS LOSSY
RENDERING AND COMPUTATIONALLY INTENSIVE CONVNET ENCODING STEPS.
• TO FURTHER BOOST VECTORNET’S CAPABILITY IN LEARNING CONTEXT FEATURES, IT PROPOSES A NOVEL AUXILIARY TASK
TO RECOVER THE RANDOMLY MASKED OUT MAP ENTITIES AND AGENT TRAJECTORIES BASED ON THEIR CONTEXT.
• IT ALSO OUTPERFORMS THE STATE OF THE ART ON THE ARGOVERSE DATASET.
https://github.com/DQSSSSS/VectorNet
VECTORNET: ENCODING HD MAPS AND AGENT
DYNAMICS FROM VECTORIZED REPRESENTATION
VECTORNET: ENCODING HD MAPS AND AGENT
DYNAMICS FROM VECTORIZED REPRESENTATION
• MOST OF THE ANNOTATIONS FROM AN HD MAP ARE IN THE FORM OF SPLINES (E.G. LANES), CLOSED SHAPE
(E.G. REGIONS OF INTERSECTIONS) AND POINTS (E.G. TRAFFIC LIGHTS), WITH ADDITIONAL ATTRIBUTE INFO SUCH
AS THE SEMANTIC LABELS OF THE ANNOTATIONS AND THEIR CURRENT STATES (E.G. COLOR OF THE TRAFFIC
LIGHT, SPEED LIMIT OF THE ROAD).
• FOR AGENTS, THEIR TRAJECTORIES ARE IN THE FORM OF DIRECTED SPLINES WITH RESPECT TO TIME.
• ALL OF THESE ELEMENTS CAN BE APPROXIMATED AS SEQUENCES OF VECTORS: FOR MAP FEATURES, PICK A
STARTING POINT AND DIRECTION, UNIFORMLY SAMPLE KEY POINTS FROM THE SPLINES AT THE SAME SPATIAL
DISTANCE, AND SEQUENTIALLY CONNECT THE NEIGHBORING KEY POINTS INTO VECTORS; FOR TRAJECTORIES,
JUST SAMPLE KEY POINTS WITH A FIXED TEMPORAL INTERVAL (0.1 SECOND), STARTING FROM T = 0, AND
CONNECT THEM INTO VECTORS.
• GIVEN SMALL ENOUGH SPATIAL OR TEMPORAL INTERVALS, THE RESULTING POLYLINES SERVE AS CLOSE
APPROXIMATIONS OF THE ORIGINAL MAP AND TRAJECTORIES.
• TO EXPLOIT THE SPATIAL AND SEMANTIC LOCALITY OF THE NODES, IT TAKES A HIERARCHICAL APPROACH BY
FIRST CONSTRUCTING SUBGRAPHS AT THE VECTOR LEVEL, WHERE ALL VECTOR NODES BELONGING TO THE
SAME POLYLINE ARE CONNECTED WITH EACH OTHER.
VECTORNET: ENCODING HD MAPS AND AGENT
DYNAMICS FROM VECTORIZED REPRESENTATION
The computation flow on the vector nodes of the
same polyline. The polyline subgraph network
can be seen as a generalization of PointNet.
However, by embedding the ordering
information into vectors, constraining the
connectivity of subgraphs based on the polyline
groupings, and encoding attributes as node
features, this method is particularly suitable to
encode structured map annotations and agent
trajectories.
VECTORNET: ENCODING HD MAPS AND AGENT
DYNAMICS FROM VECTORIZED REPRESENTATION
• TO ENCOURAGE THE GLOBAL INTERACTION GRAPH TO BETTER CAPTURE INTERACTIONS AMONG DIFFERENT
TRAJECTORIES AND MAP POLYLINES, IT INTRODUCES AN AUXILIARY GRAPH COMPLETION TASK.
• IN ORDER TO IDENTIFY AN INDIVIDUAL POLYLINE NODE WHEN ITS CORRESPONDING FEATURE IS MASKED OUT, IT
COMPUTES THE MINIMUM VALUES OF THE START COORDINATES FROM ALL OF ITS BELONGING VECTORS TO
OBTAIN THE IDENTIFIER EMBEDDING.
• THE GRAPH COMPLETION OBJECTIVE IS CLOSELY RELATED TO THE WIDELY SUCCESSFUL BERT METHOD FOR
NATURAL LANGUAGE PROCESSING (NLP), WHICH PREDICTS MISSING TOKENS BASED ON BIDIRECTIONAL
CONTEXT FROM DISCRETE AND SEQUENTIAL TEXT DATA.
• UNLIKE METHODS THAT GENERALIZES THE BERT OBJECTIVE TO UNORDERED IMAGE PATCHES WITH PRE-
COMPUTED VISUAL FEATURES, THE PROPOSED NODE FEATURES ARE JOINTLY OPTIMIZED IN AN E2E FRAMEWORK.
• THE FINAL MULTI-TASK TRAINING OBJECTIVE IS OPTIMIZED:
• LTRAJ IS THE NEGATIVE GAUSSIAN LOG-LIKELIHOOD FOR THE GROUND TRUTH FUTURE TRAJECTORIES, LNODE IS THE HUBER
LOSS BETWEEN PREDICTED NODE FEATURES AND GROUND TRUTH MASKED NODE FEATURES, Α = 1.0 IS A SCALAR THAT
BALANCES THE TWO LOSS TERMS.
VECTORNET: ENCODING HD MAPS AND AGENT
DYNAMICS FROM VECTORIZED REPRESENTATION
prediction
prediction attention for road and agent attention for road and agent
TNT: TARGET-DRIVEN TRAJECTORY PREDICTION
• THIS KEY INSIGHT IS THAT FOR PREDICTION WITHIN A MODERATE TIME HORIZON, THE FUTURE MODES CAN BE
EFFECTIVELY CAPTURED BY A SET OF TARGET STATES.
• THIS LEADS TO TARGET-DRIVEN TRAJECTORY PREDICTION (TNT) FRAMEWORK.
• TNT HAS THREE STAGES WHICH ARE TRAINED END-TO-END.
• IT FIRST PREDICTS AN AGENT’S POTENTIAL TARGET STATES T STEPS INTO THE FUTURE, BY ENCODING ITS INTERACTIONS
WITH THE ENVIRONMENT AND THE OTHER AGENTS.
• TNT THEN GENERATES TRAJECTORY STATE SEQUENCES CONDITIONED ON TARGETS.
• A FINAL STAGE ESTIMATES TRAJECTORY LIKELIHOODS AND A FINAL COMPACT SET OF TRAJECTORY PREDICTIONS IS
SELECTED.
• THIS IS IN CONTRAST TO PREVIOUS WORK WHICH MODELS AGENT INTENTS AS LATENT VARIABLES, AND RELIES
ON TEST-TIME SAMPLING TO GENERATE DIVERSE TRAJECTORIES.
• BENCHMARK TNT ON TRAJECTORY PREDICTION OF VEHICLES AND PEDESTRIANS, OUTPERFORM STATE-OF-THE-
ART ON ARGOVERSE FORECASTING, INTERACTION, STANFORD DRONE AND AN IN-HOUSE PEDESTRIAN-AT-
INTERSECTION DATASET.
TNT: TARGET-DRIVEN TRAJECTORY PREDICTION
Illustration of the TNT framework when applied to the vehicle future trajectory prediction task. TNT
consists of three stages: (a) target prediction which proposes a set of plausible targets (stars)
among all candidates (diamonds). (b) target-conditioned motion estimation which estimates a
trajectory (distribution) towards each selected target, (c) scoring and selection which ranks
trajectory hypotheses and selects a final set of trajectory predictions with likelihood scores.
TNT: TARGET-DRIVEN TRAJECTORY PREDICTION
TNT model overview. Scene context is first encoded as the model’s inputs. Then follows the core
three stages of TNT: (a) target prediction which proposes an initial set of M targets; (b) target-
conditioned motion estimation which estimates a trajectory for each target; (c) scoring and selection
which ranks trajectory hypotheses and outputs a final set of K predicted trajectories.
TNT: TARGET-DRIVEN TRAJECTORY PREDICTION
TNT supports flexible choices of targets. Vehicle target candidate points
are sampled from the lane centerlines. Pedestrian target candidate
points are sampled from a virtual grid centered on the pedestrian.
TNT: TARGET-DRIVEN TRAJECTORY PREDICTION
Large Scale Interactive Motion Forecasting For Autonomous
Driving : The WAYMO OPEN MOTION DATASET
• This is the most diverse interactive motion dataset so far, and provides specific labels for
interacting objects suitable for developing joint prediction models.
• With over 100,000 scenes, each 20 seconds long at 10 HZ, this dataset contains more
than 570 hours of unique data over 1750 km of roadways.
• It was collected by mining for interesting interactions between vehicles, pedestrians, and
cyclists across six cities within the united states.
• Use a high-accuracy 3D auto-labeling system to generate high quality 3D bounding boxes
for each road agent, and provide corresponding high definition 3D maps for each scene.
• Introduce a new set of metrics that provides a comprehensive evaluation of both single
agent and joint agent interaction motion forecasting models.
• Finally, provide strong baseline models for individual agent prediction and joint-prediction.
• https://waymo.com/open/data/motion/
Large Scale Interactive Motion Forecasting For Autonomous
Driving : The WAYMO OPEN MOTION DATASET
Examples of interactions between
agents in a scene in the WAYMO
OPEN MOTION DATASET. Each
example highlights how predicting
the joint behavior of agents aids in
predicting likely future scenarios.
Solid and dashed lines indicate the
road graph and associated lanes.
Each numeral indicates a unique
agent in the scene.
Large Scale Interactive Motion Forecasting For Autonomous
Driving : The WAYMO OPEN MOTION DATASET
• Compared to the onboard counterpart, offboard perception has two major advantages:
• 1) it can afford much more powerful models running on the ample computational resources;
• 2) it can maximally aggregate complementary information from different views by exploiting the
full point cloud sequence including both history and future.
• The offboard perception system employed contains three steps:
• (1) 3D object detector generates object proposals from each lidar frame.
• (2) multi-object tracker links detected objects throughout the lidar sequence.
• (3) for each object, an object-centric refinement network processes the tracked object boxes and
its point clouds across all frames in the track, and outputs temporally consistent and accurate 3D
bounding boxes of the object in each frame.
Large Scale Interactive Motion Forecasting For Autonomous
Driving : The WAYMO OPEN MOTION DATASET
Comparison of popular behavior prediction and motion forecasting datasets.
Specifically, compare Lyft Level 5, NuScenes, Argoverse, Interactions, and
waymo motion dataset across multiple dimensions.
Large Scale Interactive Motion Forecasting For Autonomous
Driving : The WAYMO OPEN MOTION DATASET
• The dataset provides high quality object tracks generated using an offboard perception system
along with both static and dynamic map features to provide context for the road environment.
• Mine for interesting scenarios by first hand-crafting semantic predicates involving agents’
relationships— e.g., “Agent A changed lanes at time t”, and “agents A and B crossed paths with a
time gap t and relative heading difference”.
• These predicates can be composed to retrieve more complex queries in an efficient SQL and
relational database framework on an overall data corpus orders of magnitude larger than the
resulting curated WAYMO OPEN MOTION DATASET.
• Pairwise interaction scenarios: merges, lane changes, unprotected turns, intersection left turns,
intersection right turns, pedestrian-vehicle interactions, cyclist vehicle interactions, interactions with
close proximity, and interactions with high accelerations.
Large Scale Interactive Motion Forecasting For Autonomous
Driving : The WAYMO OPEN MOTION DATASET
Diagram of baseline architecture. An illustration of the baseline architecture
employed for the family of learned models with a base LSTM encoder for
agent states. The three detachable components are a road graph polyline
encoder, a traffic state LSTM encoder, and a high-order interactions encoder
following. The trajectories are predicted through a MLP with min-of-k loss.
Large Scale Interactive Motion Forecasting For Autonomous
Driving : The WAYMO OPEN MOTION DATASET
• First, consider a constant velocity model in which we assume the agent will maintain its velocity at
the current timestamp for all future steps.
• Second, consider a family of deep-learned models using various encoders, with a base
architecture of a LSTM to encode a 1-second history of observed state; this includes agents’
positions, velocity, and 3d bounding boxes.
• In order to measure the importance of particular additional features, selectively provide
additional information:
• Road graph (rg): encode the 3D map information with polylines following.
• Traffic signals (ts): encode the traffic signal states with a LSTM encoder as an additional
feature.
• High-order interactions (hi): model the high-order interactions between agents with a global
interaction graph following.
Large Scale Interactive Motion Forecasting For Autonomous
Driving : The WAYMO OPEN MOTION DATASET
• Use conditional behavior prediction (CBP) to quantify the interactivity in our dataset.
• A model can produce either unconditional predictions or predictions conditioned on a “query
trajectory” for one of the agents in the scene.
• If two agents are not interacting, then one’s actions have no effect on the other, so knowledge of that
agent’s future should not change predictions for the other agent.
• The degree of influence agent A has on agent B is defined as KL divergence between unconditional
predictions for B and the predictions for B conditioned on a’s ground truth future trajectory.
• Apply this to interactive and standard validation datasets, computing the KL divergence between
unconditional and conditional predictions for every query agent/target agent pair in the dataset.
• KL divergences are much larger in interactive validation dataset than in standard validation dataset.
Large Scale Interactive Motion Forecasting For Autonomous
Driving : The WAYMO OPEN MOTION DATASET
The dataset contains many agents including
pedestrians and cyclists. Top: 46% of scenes have
more than 32 agents, and 11% of scenes have
more than 64 agents. Bottom: In the standard
validation set, 33.5% of scenes require at least
one pedestrian to be predicted, and 10.4% of
scenes require at least one cyclist to be predicted.
Large Scale Interactive Motion Forecasting For Autonomous
Driving : The WAYMO OPEN MOTION DATASET
Agents selected to be predicted have diverse
trajectories. Left: Ground truth trajectory of each
predicted agent in a frame of reference where all
agents start at the origin with heading pointing along
the positive X axis (pointing up). Right: Distribution
of maximum speeds achieved by all of the agents
along their 9 second trajectory. Plots depict variety
in trajectory shapes and speed profiles.
Identifying Driver Interactions Via Conditional
Behavior Prediction
• Interactive driving scenarios, such as lane changes, merges and unprotected turns, are some
of the most challenging situations for autonomous driving.
• Planning in interactive scenarios requires accurately modeling the reactions of other agents
to different future actions of the ego agent.
• It develops end-to-end models for conditional behavior prediction (CBP) that take as an
input a query future trajectory for an ego-agent, and predict distributions over future
trajectories for other agents conditioned on the query.
• Leveraging such a model, develop a general-purpose agent interactivity score derived
from probabilistic first principles.
• The interactivity score allows to find interesting interactive scenarios for training and
evaluating behavior prediction models.
Identifying Driver Interactions Via Conditional
Behavior Prediction
• Define an agent trajectory S as a fixed-length, time discretized sequence of agent states up to a
finite time horizon.
• All quantities in this work consider a pair of agents A and B.
• Without loss of generality, consider A to be the query agent whose plan for the future can
potentially affect B, the target agent.
• The future trajectories of A and B are random variables SA and SB.
• The marginal probability of a particular realization of agent b’s trajectory sb is given by p(SB = sb),
also indicated by the shorthand p(sb).
• The conditional distribution of agent b’s future trajectory given a realization of agent a’s trajectory
sa is given by p(SB = sb|SA = sa), indicated by the shorthand p(sb|sa).
Identifying Driver Interactions Via Conditional
Behavior Prediction
• Quantify interactions by estimating the change in log likelihood of the target’s ground-truth future sb
• A large change in the log-likelihood indicates a situation in which the likelihood of the target agent’s
trajectory changes significantly as a result of the query agent’s action.
• Use the kl-divergence between the conditional and marginal distributions for the target’s predicted
future trajectory SB to quantify the degree of influence exerted on B by a a trajectory sa:
• Mutual information between the two agents’ future trajectories SA and SB is computed as
• The interactivity score between agents A and B.
Identifying Driver Interactions Via Conditional
Behavior Prediction
• A CBP model predicts p(SB|SA= sa, x), the distribution of future trajectories for B conditioned on sa.
• Gaussian uncertainty over the positions of the trajectory waypoints as
• Gaussian mixture model (GMM) with mixture weights fixed over all time steps of the same trajectory
• The computation of the interactivity score also requires the estimation of marginal distributions
Identifying Driver Interactions Via Conditional
Behavior Prediction
• Use the most likely 6 modes of the marginal distribution’s GMM as in standard motion forecasting
metrics, rather than sampling N samples from the marginal distribution
• Learn to predict distribution parameters via supervised learning with the negative log-likelihood loss
• Encourage the model to maintain that agents cannot occupy the same future location in space-time,
with a loss function
Identifying Driver Interactions Via Conditional
Behavior Prediction
A conditional behavior prediction model describes
how one agent’s predicted future trajectory can
shift due to the actions of other agents.
The architecture of the conditional behavior
prediction model.
Identifying Driver Interactions Via Conditional
Behavior Prediction
Histogram of interactivity score (mutual
information) between 8,919,306 pairs of
agents in the validation dataset.
Identifying Driver Interactions Via Conditional
Behavior Prediction
Two examples of interacting agents found by
sorting examples by mutual information and
wADE. The marginal (left) and conditional
predictions (right) are shown with the query in
solid green, and predictions in dashed cyan lines.
Identifying Driver Interactions Via Conditional
Behavior Prediction
An example in which the query and target agents slow down in parallel
lanes as a result of a traffic light change. The marginal (left) and
conditional predictions (right) are shown with the query in solid green.
PEEKING INTO THE FUTURE: PREDICTING FUTURE
PERSON ACTIVITIES AND LOCATIONS IN VIDEOS
• Deciphering human behaviors to predict their future paths/trajectories and what they would do from
videos is important in many applications.
• Therfore, this work studies predicting a pedestrian’s future path jointly with future activities.
• They propose an end-to-end, multi-task learning system, called next, utilizing rich visual features
about human behavioral information and interaction with their surroundings.
• It encodes a person through rich semantic features about visual appearance, body movement and
interaction with the surroundings, motivated by the fact that humans derive such predictions by
relying on similar visual cues.
• To facilitate the training, the network is learned with an auxiliary task of predicting future location
in which the activity will happen.
• In the auxiliary task, it designs a discretized grid called the manhattan grid, as location prediction
target for the system.
https://github.com/JunweiLiang/social-distancing-prediction
PEEKING INTO THE FUTURE: PREDICTING FUTURE
PERSON ACTIVITIES AND LOCATIONS IN VIDEOS
The goal is to jointly predict a person’s future path and activity. The green and yellow line show two
possible future trajectories and two possible activities are shown in the green and yellow boxes.
Depending on the future activity, the person (top right) may take different paths, e.g. the yellow path
for “loading” and the green path for “object transfer”.
PEEKING INTO THE FUTURE: PREDICTING FUTURE
PERSON ACTIVITIES AND LOCATIONS IN VIDEOS
• HUMANS NAVIGATE THROUGH PUBLIC SPACES OFTEN WITH SPECIFIC PURPOSES IN MIND, RANGING
FROM SIMPLE ONES LIKE ENTERING A ROOM TO MORE COMPLICATED ONES LIKE PUTTING THINGS
INTO A CAR.
• SUCH INTENTION, HOWEVER, IS MOSTLY NEGLECTED IN EXISTING WORK.
• THE JOINT PREDICTION MODEL CAN HAVE TWO BENEFITS:
• 1) LEARNING THE ACTIVITY TOGETHER WITH THE PATH MAY BENEFIT THE FUTURE PATH PREDICTION;
INTUITIVELY, HUMANS ARE ABLE TO READ FROM OTHERS’ BODY LANGUAGE TO ANTICIPATE WHETHER
THEY ARE GOING TO CROSS THE STREET OR CONTINUE WALKING ALONG THE SIDEWALK.
• 2) THE JOINT MODEL ADVANCES THE CAPABILITY OF UNDERSTANDING NOT ONLY THE FUTURE PATH
BUT ALSO THE FUTURE ACTIVITY BY TAKING INTO ACCOUNT THE RICH SEMANTIC CONTEXT IN
VIDEOS; THIS INCREASES THE CAPABILITIES OF AUTOMATED VIDEO ANALYTICS FOR SOCIAL GOOD,
SUCH AS SAFETY APPLICATIONS LIKE ANTICIPATING PEDESTRIAN MOVEMENT AT TRAFFIC
INTERSECTIONS OR A ROAD ROBOT HELPING HUMANS TRANSPORT GOODS TO A CAR.
PEEKING INTO THE FUTURE: PREDICTING FUTURE
PERSON ACTIVITIES AND LOCATIONS IN VIDEOS
Overview of the Next model. Given a sequence of frames containing the person for prediction, this model utilizes
person behavior module and person interaction module to encode rich visual semantics into a feature tensor.
PEEKING INTO THE FUTURE: PREDICTING FUTURE
PERSON ACTIVITIES AND LOCATIONS IN VIDEOS
• 4 KEY COMPONENTS:
• PERSON BEHAVIOR MODULE EXTRACTS VISUAL INFORMATION FROM THE BEHAVIORAL SEQUENCE OF THE
PERSON.
• PERSON INTERACTION MODULE LOOKS AT THE INTERACTION BETWEEN A PERSON AND THEIR
SURROUNDINGS.
• TRAJECTORY GENERATOR SUMMARIZES THE ENCODED VISUAL FEATURES AND PREDICTS THE FUTURE
TRAJECTORY BY THE LSTM DECODER WITH FOCAL ATTENTION.
• ACTIVITY PREDICTION UTILIZES RICH VISUAL SEMANTICS TO PREDICT THE FUTURE ACTIVITY LABEL FOR THE
PERSON.
• IN ADDITION, DIVIDE THE SCENE INTO A DISCRETIZED GRID OF MULTIPLE SCALES, CALLED
MANHATTAN GRID, TO COMPUTE CLASSIFICATION AND REGRESSION FOR ROBUST ACTIVITY
LOCATION PREDICTION.
PEEKING INTO THE FUTURE: PREDICTING FUTURE
PERSON ACTIVITIES AND LOCATIONS IN VIDEOS
To model appearance changes of a person, utilize a pre-trained object detection model with “RoIAlign”
to extract fixed size CNN features for each person bounding box.
To average the features along the spatial dimensions for each person and feed them into an LSTM
encoder. Finally, obtain a feature representation of Tobs × d, where d is the hidden size of the LSTM. To
capture the body movement, utilize a person keypoint detection model to extract person keypoint
information. To apply the linear transformation to embed the keypoint coordinates before feeding into
the LSTM encoder. The shape of the encoded feature has the shape of Tobs × d. These appearance and
movement features are commonly used in a wide variety of studies and thus do not introduce new
concern on machine learning fairness.
PEEKING INTO THE FUTURE: PREDICTING FUTURE
PERSON ACTIVITIES AND LOCATIONS IN VIDEOS
The person-objects feature can capture how far away the person is to the other
person and the cars. The person-scene feature can capture whether the person is
near the sidewalk or grass. It designs this information to the model with the hope
of learning things like a person walks more often on the sidewalk than the grass
and tends to avoid bumping into cars.
PEEKING INTO THE FUTURE: PREDICTING FUTURE
PERSON ACTIVITIES AND LOCATIONS IN VIDEOS
• IT USES AN LSTM DECODER TO DIRECTLY PREDICT THE FUTURE TRAJECTORY IN THE XY-COORDINATE.
• THE HIDDEN STATE OF THIS DECODER IS INITIALIZED USING THE LAST STATE OF THE PERSON’S
TRAJECTORY LSTM ENCODER.
• ADD AN AUXILIARY TASK, I.E. ACTIVITY LOCATION PREDICTION, IN ADDITION TO PREDICTING THE
FUTURE ACTIVITY LABEL OF THE PERSON.
• AT EACH TIME INSTANT, THE XY-COORDINATE WILL BE COMPUTED FROM THE DECODER STATE AND
BY A FULLY CONNECTED LAYER.
• IT EMPLOYS AN EFFECTIVE FOCAL ATTENTION, ORIGINALLY PROPOSED TO CARRY OUT MULTIMODAL
INFERENCE OVER A SEQUENCE OF IMAGES FOR VISUAL QUESTION ANSWERING; WHICH KEY IDEA IS
TO PROJECT MULTIPLE FEATURES INTO A SPACE OF CORRELATION, WHERE DISCRIMINATIVE FEATURES
CAN BE EASIER TO CAPTURE BY THE ATTENTION MECHANISM.
PEEKING INTO THE FUTURE: PREDICTING FUTURE
PERSON ACTIVITIES AND LOCATIONS IN VIDEOS
To bridge the gap between trajectory generation and activity label prediction, it proposes an activity
location prediction (ALP) module to predict the final location of where the person will engage in the future
activity. The activity location prediction includes two tasks, location classification and location regression.
PEEKING INTO THE FUTURE: PREDICTING FUTURE
PERSON ACTIVITIES AND LOCATIONS IN VIDEOS
Qualitative comparison between this method and the baselines. Yellow path is the observable trajectory and
green path is the ground truth trajectory during the prediction period. Predictions are shown as blue heatmaps.
STINET: SPATIO-TEMPORAL-INTERACTIVE NETWORK
FOR PEDESTRIAN DETECTION AND TRAJECTORY
PREDICTION
• DETECTING PEDESTRIANS AND PREDICTING FUTURE TRAJECTORIES FOR THEM ARE CRITICAL TASKS FOR NUMEROUS
APPLICATIONS, SUCH AS AUTONOMOUS DRIVING.
• PREVIOUS METHODS EITHER TREAT THE DETECTION AND PREDICTION AS SEPARATE TASKS OR SIMPLY ADD A TRAJECTORY
REGRESSION HEAD ON TOP OF A DETECTOR.
• AN END-TO-END TWO-STAGE NETWORK: SPATIO-TEMPORAL-INTERACTIVE NETWORK (STINET).
• IN ADDITION TO 3D GEOMETRY MODELING OF PEDESTRIANS, MODEL THE TEMPORAL INFORMATION FOR EACH OF THE
PEDESTRIANS.
• IT PREDICTS BOTH CURRENT AND PAST LOCATIONS IN THE FIRST STAGE, SO THAT EACH PEDESTRIAN CAN BE LINKED
ACROSS FRAMES AND THE COMPREHENSIVE SPATIO-TEMPORAL INFORMATION CAN BE CAPTURED IN THE SECOND
STAGE.
• ALSO, MODEL THE INTERACTION AMONG OBJECTS WITH AN INTERACTION GRAPH, TO GATHER THE INFORMATION
AMONG THE NEIGHBORING OBJECTS.
• COMPREHENSIVE EXPERIMENTS ON THE LYFT DATASET AND THE RECENTLY RELEASED LARGE-SCALE WAYMO OPEN
DATASET FOR BOTH OBJECT DETECTION AND FUTURE TRAJECTORY PREDICTION.
STINET: SPATIO-TEMPORAL-INTERACTIVE NETWORK
FOR PEDESTRIAN DETECTION AND TRAJECTORY
PREDICTION
The overview. It takes a sequence of point clouds as input, detects pedestrians and predicts their future
trajectories simultaneously. The point clouds are processed by Pillar Feature Encoding to generate Pillar
Features. Then each Pillar Feature is fed into a backbone ResUNet to get backbone features. A Temporal
Region Proposal Network (T-RPN) takes backbone features and generated temporal proposal with past
and current boxes for each object. Spatio-Temporal-Interactive (STI) Feature Extractor learns features
for each temporal proposal which are used for final detection and trajectory prediction.
STINET: SPATIO-TEMPORAL-INTERACTIVE NETWORK
FOR PEDESTRIAN DETECTION AND TRAJECTORY
PREDICTION
Backbone. Upper: overview of the backbone. The
input point cloud sequence is fed to Voxelization and
Point net to generate pseudo images, which are then
processed by ResNet U-Net to generate final
backbone feature sequence. Lower: detailed design
of ResNet U-Net.
STINET: SPATIO-TEMPORAL-INTERACTIVE NETWORK
FOR PEDESTRIAN DETECTION AND TRAJECTORY
PREDICTION
Spatial-Temporal-Interactive Feature Extractor
(STI- FE): Local geometry, local dynamic and
history path features are extracted given a
temporal proposal. For local geometry and
local dynamics features, the yellow areas are
used for feature extraction. Relational
reasoning is performed across proposals’ local
features to generate interactive features.
STINET: SPATIO-TEMPORAL-INTERACTIVE NETWORK
FOR PEDESTRIAN DETECTION AND TRAJECTORY
PREDICTION
Prediction and planning for self driving at waymo

More Related Content

What's hot

Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向HiroakiMinoura
 
Autoware Architecture Proposal
Autoware Architecture ProposalAutoware Architecture Proposal
Autoware Architecture ProposalTier_IV
 
SLAM勉強会(3) LSD-SLAM
SLAM勉強会(3) LSD-SLAMSLAM勉強会(3) LSD-SLAM
SLAM勉強会(3) LSD-SLAMIwami Kazuya
 
[DL輪読会]BADGR: An Autonomous Self-Supervised Learning-Based Navigation System
[DL輪読会]BADGR: An Autonomous Self-Supervised Learning-Based Navigation System[DL輪読会]BADGR: An Autonomous Self-Supervised Learning-Based Navigation System
[DL輪読会]BADGR: An Autonomous Self-Supervised Learning-Based Navigation SystemDeep Learning JP
 
RDF/OWLの概要及びOSS実装、及び活用イメージについて
RDF/OWLの概要及びOSS実装、及び活用イメージについてRDF/OWLの概要及びOSS実装、及び活用イメージについて
RDF/OWLの概要及びOSS実装、及び活用イメージについてMasayuki Isobe
 
Lucas kanade法について
Lucas kanade法についてLucas kanade法について
Lucas kanade法についてHitoshi Nishimura
 
mcl_3dl: amcl並に軽量な3-D/6-DoFローカリゼーションパッケージ
mcl_3dl: amcl並に軽量な3-D/6-DoFローカリゼーションパッケージmcl_3dl: amcl並に軽量な3-D/6-DoFローカリゼーションパッケージ
mcl_3dl: amcl並に軽量な3-D/6-DoFローカリゼーションパッケージAtsushi Watanabe
 
Computer vision, machine, and deep learning
Computer vision, machine, and deep learningComputer vision, machine, and deep learning
Computer vision, machine, and deep learningIgi Ardiyanto
 
確率ロボティクス第11回
確率ロボティクス第11回確率ロボティクス第11回
確率ロボティクス第11回Ryuichi Ueda
 
MCMC法
MCMC法MCMC法
MCMC法MatsuiRyo
 
[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph GenerationDeep Learning JP
 
[DL輪読会]“Highly accurate protein structure prediction with AlphaFold”
[DL輪読会]“Highly accurate protein structure prediction with AlphaFold”[DL輪読会]“Highly accurate protein structure prediction with AlphaFold”
[DL輪読会]“Highly accurate protein structure prediction with AlphaFold”Deep Learning JP
 
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative ModelsDeep Learning JP
 
FOSS4GとOSMを使って 「逃げ地図」のようなものをつくってみる!
FOSS4GとOSMを使って 「逃げ地図」のようなものをつくってみる! FOSS4GとOSMを使って 「逃げ地図」のようなものをつくってみる!
FOSS4GとOSMを使って 「逃げ地図」のようなものをつくってみる! IWASAKI NOBUSUKE
 
Trend of 3D object detections
Trend of 3D object detectionsTrend of 3D object detections
Trend of 3D object detectionsEiji Sekiya
 
機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編Daiyu Hatakeyama
 
Transformer 動向調査 in 画像認識(修正版)
Transformer 動向調査 in 画像認識(修正版)Transformer 動向調査 in 画像認識(修正版)
Transformer 動向調査 in 画像認識(修正版)Kazuki Maeno
 
[DL Hacks 実装]Playing FPS Games with Deep Reinforcement Learning
[DL Hacks 実装]Playing FPS Games with Deep Reinforcement Learning[DL Hacks 実装]Playing FPS Games with Deep Reinforcement Learning
[DL Hacks 実装]Playing FPS Games with Deep Reinforcement LearningDeep Learning JP
 
深層学習を用いた車内モニタリングによる状態認識に関する研究
深層学習を用いた車内モニタリングによる状態認識に関する研究深層学習を用いた車内モニタリングによる状態認識に関する研究
深層学習を用いた車内モニタリングによる状態認識に関する研究harmonylab
 
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -Akisato Kimura
 

What's hot (20)

Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向
 
Autoware Architecture Proposal
Autoware Architecture ProposalAutoware Architecture Proposal
Autoware Architecture Proposal
 
SLAM勉強会(3) LSD-SLAM
SLAM勉強会(3) LSD-SLAMSLAM勉強会(3) LSD-SLAM
SLAM勉強会(3) LSD-SLAM
 
[DL輪読会]BADGR: An Autonomous Self-Supervised Learning-Based Navigation System
[DL輪読会]BADGR: An Autonomous Self-Supervised Learning-Based Navigation System[DL輪読会]BADGR: An Autonomous Self-Supervised Learning-Based Navigation System
[DL輪読会]BADGR: An Autonomous Self-Supervised Learning-Based Navigation System
 
RDF/OWLの概要及びOSS実装、及び活用イメージについて
RDF/OWLの概要及びOSS実装、及び活用イメージについてRDF/OWLの概要及びOSS実装、及び活用イメージについて
RDF/OWLの概要及びOSS実装、及び活用イメージについて
 
Lucas kanade法について
Lucas kanade法についてLucas kanade法について
Lucas kanade法について
 
mcl_3dl: amcl並に軽量な3-D/6-DoFローカリゼーションパッケージ
mcl_3dl: amcl並に軽量な3-D/6-DoFローカリゼーションパッケージmcl_3dl: amcl並に軽量な3-D/6-DoFローカリゼーションパッケージ
mcl_3dl: amcl並に軽量な3-D/6-DoFローカリゼーションパッケージ
 
Computer vision, machine, and deep learning
Computer vision, machine, and deep learningComputer vision, machine, and deep learning
Computer vision, machine, and deep learning
 
確率ロボティクス第11回
確率ロボティクス第11回確率ロボティクス第11回
確率ロボティクス第11回
 
MCMC法
MCMC法MCMC法
MCMC法
 
[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation
 
[DL輪読会]“Highly accurate protein structure prediction with AlphaFold”
[DL輪読会]“Highly accurate protein structure prediction with AlphaFold”[DL輪読会]“Highly accurate protein structure prediction with AlphaFold”
[DL輪読会]“Highly accurate protein structure prediction with AlphaFold”
 
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
 
FOSS4GとOSMを使って 「逃げ地図」のようなものをつくってみる!
FOSS4GとOSMを使って 「逃げ地図」のようなものをつくってみる! FOSS4GとOSMを使って 「逃げ地図」のようなものをつくってみる!
FOSS4GとOSMを使って 「逃げ地図」のようなものをつくってみる!
 
Trend of 3D object detections
Trend of 3D object detectionsTrend of 3D object detections
Trend of 3D object detections
 
機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編
 
Transformer 動向調査 in 画像認識(修正版)
Transformer 動向調査 in 画像認識(修正版)Transformer 動向調査 in 画像認識(修正版)
Transformer 動向調査 in 画像認識(修正版)
 
[DL Hacks 実装]Playing FPS Games with Deep Reinforcement Learning
[DL Hacks 実装]Playing FPS Games with Deep Reinforcement Learning[DL Hacks 実装]Playing FPS Games with Deep Reinforcement Learning
[DL Hacks 実装]Playing FPS Games with Deep Reinforcement Learning
 
深層学習を用いた車内モニタリングによる状態認識に関する研究
深層学習を用いた車内モニタリングによる状態認識に関する研究深層学習を用いた車内モニタリングによる状態認識に関する研究
深層学習を用いた車内モニタリングによる状態認識に関する研究
 
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -
人間の視覚的注意を予測するモデル - 動的ベイジアンネットワークに基づく 最新のアプローチ -
 

Similar to Prediction and planning for self driving at waymo

Driving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving XDriving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving XYu Huang
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIYu Huang
 
Pedestrian Behavior/Intention Modeling for Autonomous Driving VI
Pedestrian Behavior/Intention Modeling for Autonomous Driving VIPedestrian Behavior/Intention Modeling for Autonomous Driving VI
Pedestrian Behavior/Intention Modeling for Autonomous Driving VIYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving IIIPedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving IIIYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VYu Huang
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVPedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVYu Huang
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivYu Huang
 
Driving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIIIDriving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIIIYu Huang
 
Predict Traffic flow with KNN and LSTM
Predict Traffic flow with KNN and LSTMPredict Traffic flow with KNN and LSTM
Predict Traffic flow with KNN and LSTMAfzaal Subhani
 
Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
Deep Multi-View Spatial-Temporal Network for Taxi Demand PredictionDeep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
Deep Multi-View Spatial-Temporal Network for Taxi Demand Predictionivaderivader
 
Pedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving IIPedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving IIYu Huang
 
Driving behaviors for adas and autonomous driving XIII
Driving behaviors for adas and autonomous driving XIIIDriving behaviors for adas and autonomous driving XIII
Driving behaviors for adas and autonomous driving XIIIYu Huang
 
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...IRJET Journal
 
IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIsam Al Jawarneh, PhD
 
Adaptive Feature Fusion Networks for Origin-Destination Passenger Flow Predic...
Adaptive Feature Fusion Networks for Origin-Destination Passenger Flow Predic...Adaptive Feature Fusion Networks for Origin-Destination Passenger Flow Predic...
Adaptive Feature Fusion Networks for Origin-Destination Passenger Flow Predic...Shakas Technologies
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Coupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand PredictionCoupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand Predictionivaderivader
 
Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...tsysglobalsolutions
 

Similar to Prediction and planning for self driving at waymo (20)

Driving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving XDriving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving X
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XI
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VII
 
Pedestrian Behavior/Intention Modeling for Autonomous Driving VI
Pedestrian Behavior/Intention Modeling for Autonomous Driving VIPedestrian Behavior/Intention Modeling for Autonomous Driving VI
Pedestrian Behavior/Intention Modeling for Autonomous Driving VI
 
Pedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving IIIPedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving III
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
 
Pedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVPedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IV
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xiv
 
Driving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIIIDriving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIII
 
Predict Traffic flow with KNN and LSTM
Predict Traffic flow with KNN and LSTMPredict Traffic flow with KNN and LSTM
Predict Traffic flow with KNN and LSTM
 
Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
Deep Multi-View Spatial-Temporal Network for Taxi Demand PredictionDeep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction
 
Pedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving IIPedestrian behavior/intention modeling for autonomous driving II
Pedestrian behavior/intention modeling for autonomous driving II
 
Driving behaviors for adas and autonomous driving XIII
Driving behaviors for adas and autonomous driving XIIIDriving behaviors for adas and autonomous driving XIII
Driving behaviors for adas and autonomous driving XIII
 
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
 
IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al Jawarneh
 
Adaptive Feature Fusion Networks for Origin-Destination Passenger Flow Predic...
Adaptive Feature Fusion Networks for Origin-Destination Passenger Flow Predic...Adaptive Feature Fusion Networks for Origin-Destination Passenger Flow Predic...
Adaptive Feature Fusion Networks for Origin-Destination Passenger Flow Predic...
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Coupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand PredictionCoupled Layer-wise Graph Convolution for Transportation Demand Prediction
Coupled Layer-wise Graph Convolution for Transportation Demand Prediction
 
Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 

Recently uploaded

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 

Recently uploaded (20)

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 

Prediction and planning for self driving at waymo

  • 1. PREDICTION AND PLANNING FOR SELF DRIVING @WAYMO (GOOGLE) YU HUANG SUNNYVALE, CALIFORNIA YU.HUANG07@GMAIL.COM
  • 2. References • ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst • Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction • VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation • TNT: Target-driven Trajectory Prediction • Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset • Identifying Driver Interactions Via Conditional Behavior Prediction • Peeking Into The Future: Predicting Future Person Activities And Locations In Videos • STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
  • 3. CHAUFFEURNET: LEARNING TO DRIVE BY IMITATING THE BEST SYNTHESIZING THE WORST • TRAIN A POLICY FOR AUTONOMOUS DRIVING VIA IMITATION LEARNING THAT IS ROBUST ENOUGH TO DRIVE A REAL VEHICLE. • STANDARD BEHAVIOR CLONING IS INSUFFICIENT FOR HANDLING COMPLEX DRIVING SCENARIOS, EVEN LEVERAGING A PERCEPTION SYSTEM FOR PREPROCESSING THE INPUT AND A CONTROLLER FOR EXECUTING THE OUTPUT ON THE CAR: 30 MILLION EXAMPLES ARE STILL NOT ENOUGH. • EXPOSING THE LEARNER TO SYNTHESIZED DATA IN THE FORM OF PERTURBATIONS TO THE EXPERT’S DRIVING, WHICH CREATES INTERESTING SITUATIONS SUCH AS COLLISIONS AND/OR GOING OFF THE ROAD. • RATHER THAN PURELY IMITATING ALL DATA, AUGMENT THE IMITATION LOSS WITH ADDITIONAL LOSSES THAT PENALIZE UNDESIRABLE EVENTS AND ENCOURAGE PROGRESS – THE PERTURBATIONS THEN PROVIDE AN IMPORTANT SIGNAL FOR THESE LOSSES AND LEAD TO ROBUSTNESS OF THE LEARNED MODEL. • CHAUFFEURNET MODEL CAN HANDLE COMPLEX SITUATIONS IN SIMULATION.
  • 4. CHAUFFEURNET: LEARNING TO DRIVE BY IMITATING THE BEST SYNTHESIZING THE WORST
  • 5. CHAUFFEURNET: LEARNING TO DRIVE BY IMITATING THE BEST SYNTHESIZING THE WORST
  • 6. CHAUFFEURNET: LEARNING TO DRIVE BY IMITATING THE BEST SYNTHESIZING THE WORST
  • 7. CHAUFFEURNET: LEARNING TO DRIVE BY IMITATING THE BEST SYNTHESIZING THE WORST
  • 8. CHAUFFEURNET: LEARNING TO DRIVE BY IMITATING THE BEST SYNTHESIZING THE WORST
  • 9. CHAUFFEURNET: LEARNING TO DRIVE BY IMITATING THE BEST SYNTHESIZING THE WORST
  • 10. CHAUFFEURNET: LEARNING TO DRIVE BY IMITATING THE BEST SYNTHESIZING THE WORST
  • 11. MULTIPATH: MULTIPLE PROBABILISTIC ANCHOR TRAJECTORY HYPOTHESES FOR BEHAVIOR PREDICTION • PREDICTING HUMAN BEHAVIOR IS A DIFFICULT AND CRUCIAL TASK REQUIRED FOR MOTION PLANNING. • IT IS CHALLENGING IN LARGE PART DUE TO THE HIGHLY UNCERTAIN AND MULTIMODAL SET OF POSSIBLE OUTCOMES IN REAL-WORLD DOMAINS SUCH AS AUTONOMOUS DRIVING. • BEYOND SINGLE MAP TRAJECTORY PREDICTION, OBTAINING AN ACCURATE PROBABILITY DISTRIBUTION OF THE FUTURE IS AN AREA OF ACTIVE INTEREST. • MULTIPATH LEVERAGES A FIXED SET OF FUTURE STATE-SEQUENCE ANCHORS THAT CORRESPOND TO MODES OF THE TRAJECTORY DISTRIBUTION. • AT INFERENCE, THE MODEL PREDICTS A DISCRETE DISTRIBUTION OVER THE ANCHORS AND, FOR EACH ANCHOR, REGRESSES OFFSETS FROM ANCHOR WAYPOINTS ALONG WITH UNCERTAINTIES, YIELDING A GAUSSIAN MIXTURE AT EACH TIME STEP. • THE MODEL IS EFFICIENT, REQUIRING ONLY ONE FORWARD INFERENCE PASS TO OBTAIN MULTI-MODAL FUTURE DISTRIBUTIONS, AND THE OUTPUT IS PARAMETRIC, ALLOWING COMPACT COMMUNICATION AND ANALYTICAL PROBABILISTIC QUERIES.
  • 12. MULTIPATH: MULTIPLE PROBABILISTIC ANCHOR TRAJECTORY HYPOTHESES FOR BEHAVIOR PREDICTION
  • 13. MULTIPATH: MULTIPLE PROBABILISTIC ANCHOR TRAJECTORY HYPOTHESES FOR BEHAVIOR PREDICTION • MULTIPATH ESTIMATES THE DISTRIBUTION OVER FUTURE TRAJECTORIES PER AGENT IN A SCENE, AS FOLLOWS: • 1) BASED ON A TOP-DOWN SCENE REPRESENTATION, THE SCENE CNN EXTRACTS MID-LEVEL FEATURES THAT ENCODE THE STATE OF INDIVIDUAL AGENTS AND THEIR INTERACTIONS. • 2) FOR EACH AGENT IN THE SCENE, CROP AN AGENT-CENTRIC VIEW OF THE MID-LEVEL FEATURE REPRESENTATION AND PREDICT THE PROBABILITIES OVER THE FIXED SET OF K PREDEFINED ANCHOR TRAJECTORIES. • 3) FOR EACH ANCHOR, THE MODEL REGRESSES OFFSETS FROM THE ANCHOR STATES AND UNCERTAINTY DISTRIBUTIONS FOR EACH FUTURE TIME STEP. • THE DISTRIBUTION IS PARAMETERIZED BY ANCHOR TRAJECTORIES A; DIRECTLY LEARNING A MIXTURE SUFFERS FROM ISSUES OF MODE COLLAPSE, AS IS COMMON PRACTICE IN OTHER DOMAINS SUCH AS OBJECT DETECTION AND HUMAN POSE ESTIMATION, IT ESTIMATES THE ANCHORS A-PRIORI BEFORE FIXING THEM TO LEARN THE REST OF OUR PARAMETERS; A PRACTICAL WAY IS THE K-MEANS ALGORITHM AS A SIMPLE APPROXIMATION TO OBTAIN A. • IT TRAINS THE MODEL VIA IMITATION LEARNING BY FITTING PARAMETERS TO MAXIMIZE THE LOG-LIKELIHOOD OF RECORDED DRIVING TRAJECTORIES.
  • 14. MULTIPATH: MULTIPLE PROBABILISTIC ANCHOR TRAJECTORY HYPOTHESES FOR BEHAVIOR PREDICTION • THEY STILL REPRESENT A HISTORY OF DYNAMIC AND STATIC SCENE CONTEXT AS A 3- DIMENSIONAL ARRAY OF DATA RENDERED FROM A TOP-DOWN ORTHOGRAPHIC PERSPECTIVE. • THE FIRST TWO DIMENSIONS REPRESENT SPATIAL LOCATIONS IN THE TOP-DOWN IMAGE. • THE CHANNELS IN THE DEPTH DIMENSION HOLD STATIC AND TIME-VARYING (DYNAMIC) CONTENT OF A FIXED NUMBER OF PREVIOUS TIME STEPS. • AGENT OBSERVATIONS ARE RENDERED AS ORIENTATED BOUNDING BOX BINARY IMAGES, ONE CHANNEL FOR EACH TIME STEP. • OTHER DYNAMIC CONTEXT SUCH AS TRAFFIC LIGHT STATE AND STATIC CONTEXT OF THE ROAD (LANE CONNECTIVITY AND TYPE, STOP LINES, SPEED LIMIT, ETC.) FORM ADDITIONAL CHANNELS. • AN IMPORTANT BENEFIT OF USING SUCH A TOP-DOWN REPRESENTATION IS THE SIMPLICITY OF REPRESENTING CONTEXTUAL INFORMATION LIKE THE AGENTS’ SPATIAL RELATIONSHIPS TO EACH OTHER AND SEMANTIC ROAD INFORMATION.
  • 15. MULTIPATH: MULTIPLE PROBABILISTIC ANCHOR TRAJECTORY HYPOTHESES FOR BEHAVIOR PREDICTION Top: Logged trajectories of all agents are displayed in cyan. The focused agent is highlighted by a red circle. Bottom: MultiPath showing up to 5 trajectories with uncertainty ellipses. Trajectory probabilities (softmax outputs) are encoded in a color map shown to the right. MultiPath can predict uncertain future trajectories for various speed (1st column), different intent at intersections (2nd and 3rd columns) and lane changes (4th and 5th columns), where the regression baseline only predicts a single intent.
  • 16. VECTORNET: ENCODING HD MAPS AND AGENT DYNAMICS FROM VECTORIZED REPRESENTATION • BEHAVIOR PREDICTION IN DYNAMIC, MULTI-AGENT SYSTEMS IS AN IMPORTANT PROBLEM IN THE CONTEXT OF SELF- DRIVING CARS, DUE TO THE COMPLEX REPRESENTATIONS AND INTERACTIONS OF ROAD COMPONENTS, INCLUDING MOVING AGENTS (E.G. PEDESTRIANS AND VEHICLES) AND ROAD CONTEXT INFORMATION (E.G. LANES, TRAFFIC LIGHTS). • THIS PAPER INTRODUCES VECTORNET, A HIERARCHICAL GRAPH NEURAL NETWORK (GNN) THAT FIRST EXPLOITS THE SPATIAL LOCALITY OF INDIVIDUAL ROAD COMPONENTS REPRESENTED BY VECTORS AND THEN MODELS THE HIGH- ORDER INTERACTIONS AMONG ALL COMPONENTS. • IN CONTRAST TO MOST RECENT APPROACHES, WHICH RENDER TRAJECTORIES OF MOVING AGENTS AND ROAD CONTEXT INFORMATION AS BIRD-EYE IMAGES AND ENCODE THEM WITH CONVOLUTIONAL NEURAL NETWORKS (CONVNETS), THIS APPROACH OPERATES ON A VECTOR REPRESENTATION. • BY OPERATING ON THE VECTORIZED HIGH DEFINITION (HD) MAPS AND AGENT TRAJECTORIES, IT AVOIDS LOSSY RENDERING AND COMPUTATIONALLY INTENSIVE CONVNET ENCODING STEPS. • TO FURTHER BOOST VECTORNET’S CAPABILITY IN LEARNING CONTEXT FEATURES, IT PROPOSES A NOVEL AUXILIARY TASK TO RECOVER THE RANDOMLY MASKED OUT MAP ENTITIES AND AGENT TRAJECTORIES BASED ON THEIR CONTEXT. • IT ALSO OUTPERFORMS THE STATE OF THE ART ON THE ARGOVERSE DATASET. https://github.com/DQSSSSS/VectorNet
  • 17. VECTORNET: ENCODING HD MAPS AND AGENT DYNAMICS FROM VECTORIZED REPRESENTATION
  • 18. VECTORNET: ENCODING HD MAPS AND AGENT DYNAMICS FROM VECTORIZED REPRESENTATION • MOST OF THE ANNOTATIONS FROM AN HD MAP ARE IN THE FORM OF SPLINES (E.G. LANES), CLOSED SHAPE (E.G. REGIONS OF INTERSECTIONS) AND POINTS (E.G. TRAFFIC LIGHTS), WITH ADDITIONAL ATTRIBUTE INFO SUCH AS THE SEMANTIC LABELS OF THE ANNOTATIONS AND THEIR CURRENT STATES (E.G. COLOR OF THE TRAFFIC LIGHT, SPEED LIMIT OF THE ROAD). • FOR AGENTS, THEIR TRAJECTORIES ARE IN THE FORM OF DIRECTED SPLINES WITH RESPECT TO TIME. • ALL OF THESE ELEMENTS CAN BE APPROXIMATED AS SEQUENCES OF VECTORS: FOR MAP FEATURES, PICK A STARTING POINT AND DIRECTION, UNIFORMLY SAMPLE KEY POINTS FROM THE SPLINES AT THE SAME SPATIAL DISTANCE, AND SEQUENTIALLY CONNECT THE NEIGHBORING KEY POINTS INTO VECTORS; FOR TRAJECTORIES, JUST SAMPLE KEY POINTS WITH A FIXED TEMPORAL INTERVAL (0.1 SECOND), STARTING FROM T = 0, AND CONNECT THEM INTO VECTORS. • GIVEN SMALL ENOUGH SPATIAL OR TEMPORAL INTERVALS, THE RESULTING POLYLINES SERVE AS CLOSE APPROXIMATIONS OF THE ORIGINAL MAP AND TRAJECTORIES. • TO EXPLOIT THE SPATIAL AND SEMANTIC LOCALITY OF THE NODES, IT TAKES A HIERARCHICAL APPROACH BY FIRST CONSTRUCTING SUBGRAPHS AT THE VECTOR LEVEL, WHERE ALL VECTOR NODES BELONGING TO THE SAME POLYLINE ARE CONNECTED WITH EACH OTHER.
  • 19. VECTORNET: ENCODING HD MAPS AND AGENT DYNAMICS FROM VECTORIZED REPRESENTATION The computation flow on the vector nodes of the same polyline. The polyline subgraph network can be seen as a generalization of PointNet. However, by embedding the ordering information into vectors, constraining the connectivity of subgraphs based on the polyline groupings, and encoding attributes as node features, this method is particularly suitable to encode structured map annotations and agent trajectories.
  • 20. VECTORNET: ENCODING HD MAPS AND AGENT DYNAMICS FROM VECTORIZED REPRESENTATION • TO ENCOURAGE THE GLOBAL INTERACTION GRAPH TO BETTER CAPTURE INTERACTIONS AMONG DIFFERENT TRAJECTORIES AND MAP POLYLINES, IT INTRODUCES AN AUXILIARY GRAPH COMPLETION TASK. • IN ORDER TO IDENTIFY AN INDIVIDUAL POLYLINE NODE WHEN ITS CORRESPONDING FEATURE IS MASKED OUT, IT COMPUTES THE MINIMUM VALUES OF THE START COORDINATES FROM ALL OF ITS BELONGING VECTORS TO OBTAIN THE IDENTIFIER EMBEDDING. • THE GRAPH COMPLETION OBJECTIVE IS CLOSELY RELATED TO THE WIDELY SUCCESSFUL BERT METHOD FOR NATURAL LANGUAGE PROCESSING (NLP), WHICH PREDICTS MISSING TOKENS BASED ON BIDIRECTIONAL CONTEXT FROM DISCRETE AND SEQUENTIAL TEXT DATA. • UNLIKE METHODS THAT GENERALIZES THE BERT OBJECTIVE TO UNORDERED IMAGE PATCHES WITH PRE- COMPUTED VISUAL FEATURES, THE PROPOSED NODE FEATURES ARE JOINTLY OPTIMIZED IN AN E2E FRAMEWORK. • THE FINAL MULTI-TASK TRAINING OBJECTIVE IS OPTIMIZED: • LTRAJ IS THE NEGATIVE GAUSSIAN LOG-LIKELIHOOD FOR THE GROUND TRUTH FUTURE TRAJECTORIES, LNODE IS THE HUBER LOSS BETWEEN PREDICTED NODE FEATURES AND GROUND TRUTH MASKED NODE FEATURES, Α = 1.0 IS A SCALAR THAT BALANCES THE TWO LOSS TERMS.
  • 21. VECTORNET: ENCODING HD MAPS AND AGENT DYNAMICS FROM VECTORIZED REPRESENTATION prediction prediction attention for road and agent attention for road and agent
  • 22. TNT: TARGET-DRIVEN TRAJECTORY PREDICTION • THIS KEY INSIGHT IS THAT FOR PREDICTION WITHIN A MODERATE TIME HORIZON, THE FUTURE MODES CAN BE EFFECTIVELY CAPTURED BY A SET OF TARGET STATES. • THIS LEADS TO TARGET-DRIVEN TRAJECTORY PREDICTION (TNT) FRAMEWORK. • TNT HAS THREE STAGES WHICH ARE TRAINED END-TO-END. • IT FIRST PREDICTS AN AGENT’S POTENTIAL TARGET STATES T STEPS INTO THE FUTURE, BY ENCODING ITS INTERACTIONS WITH THE ENVIRONMENT AND THE OTHER AGENTS. • TNT THEN GENERATES TRAJECTORY STATE SEQUENCES CONDITIONED ON TARGETS. • A FINAL STAGE ESTIMATES TRAJECTORY LIKELIHOODS AND A FINAL COMPACT SET OF TRAJECTORY PREDICTIONS IS SELECTED. • THIS IS IN CONTRAST TO PREVIOUS WORK WHICH MODELS AGENT INTENTS AS LATENT VARIABLES, AND RELIES ON TEST-TIME SAMPLING TO GENERATE DIVERSE TRAJECTORIES. • BENCHMARK TNT ON TRAJECTORY PREDICTION OF VEHICLES AND PEDESTRIANS, OUTPERFORM STATE-OF-THE- ART ON ARGOVERSE FORECASTING, INTERACTION, STANFORD DRONE AND AN IN-HOUSE PEDESTRIAN-AT- INTERSECTION DATASET.
  • 23. TNT: TARGET-DRIVEN TRAJECTORY PREDICTION Illustration of the TNT framework when applied to the vehicle future trajectory prediction task. TNT consists of three stages: (a) target prediction which proposes a set of plausible targets (stars) among all candidates (diamonds). (b) target-conditioned motion estimation which estimates a trajectory (distribution) towards each selected target, (c) scoring and selection which ranks trajectory hypotheses and selects a final set of trajectory predictions with likelihood scores.
  • 24. TNT: TARGET-DRIVEN TRAJECTORY PREDICTION TNT model overview. Scene context is first encoded as the model’s inputs. Then follows the core three stages of TNT: (a) target prediction which proposes an initial set of M targets; (b) target- conditioned motion estimation which estimates a trajectory for each target; (c) scoring and selection which ranks trajectory hypotheses and outputs a final set of K predicted trajectories.
  • 25. TNT: TARGET-DRIVEN TRAJECTORY PREDICTION TNT supports flexible choices of targets. Vehicle target candidate points are sampled from the lane centerlines. Pedestrian target candidate points are sampled from a virtual grid centered on the pedestrian.
  • 27. Large Scale Interactive Motion Forecasting For Autonomous Driving : The WAYMO OPEN MOTION DATASET • This is the most diverse interactive motion dataset so far, and provides specific labels for interacting objects suitable for developing joint prediction models. • With over 100,000 scenes, each 20 seconds long at 10 HZ, this dataset contains more than 570 hours of unique data over 1750 km of roadways. • It was collected by mining for interesting interactions between vehicles, pedestrians, and cyclists across six cities within the united states. • Use a high-accuracy 3D auto-labeling system to generate high quality 3D bounding boxes for each road agent, and provide corresponding high definition 3D maps for each scene. • Introduce a new set of metrics that provides a comprehensive evaluation of both single agent and joint agent interaction motion forecasting models. • Finally, provide strong baseline models for individual agent prediction and joint-prediction. • https://waymo.com/open/data/motion/
  • 28. Large Scale Interactive Motion Forecasting For Autonomous Driving : The WAYMO OPEN MOTION DATASET Examples of interactions between agents in a scene in the WAYMO OPEN MOTION DATASET. Each example highlights how predicting the joint behavior of agents aids in predicting likely future scenarios. Solid and dashed lines indicate the road graph and associated lanes. Each numeral indicates a unique agent in the scene.
  • 29. Large Scale Interactive Motion Forecasting For Autonomous Driving : The WAYMO OPEN MOTION DATASET • Compared to the onboard counterpart, offboard perception has two major advantages: • 1) it can afford much more powerful models running on the ample computational resources; • 2) it can maximally aggregate complementary information from different views by exploiting the full point cloud sequence including both history and future. • The offboard perception system employed contains three steps: • (1) 3D object detector generates object proposals from each lidar frame. • (2) multi-object tracker links detected objects throughout the lidar sequence. • (3) for each object, an object-centric refinement network processes the tracked object boxes and its point clouds across all frames in the track, and outputs temporally consistent and accurate 3D bounding boxes of the object in each frame.
  • 30. Large Scale Interactive Motion Forecasting For Autonomous Driving : The WAYMO OPEN MOTION DATASET Comparison of popular behavior prediction and motion forecasting datasets. Specifically, compare Lyft Level 5, NuScenes, Argoverse, Interactions, and waymo motion dataset across multiple dimensions.
  • 31. Large Scale Interactive Motion Forecasting For Autonomous Driving : The WAYMO OPEN MOTION DATASET • The dataset provides high quality object tracks generated using an offboard perception system along with both static and dynamic map features to provide context for the road environment. • Mine for interesting scenarios by first hand-crafting semantic predicates involving agents’ relationships— e.g., “Agent A changed lanes at time t”, and “agents A and B crossed paths with a time gap t and relative heading difference”. • These predicates can be composed to retrieve more complex queries in an efficient SQL and relational database framework on an overall data corpus orders of magnitude larger than the resulting curated WAYMO OPEN MOTION DATASET. • Pairwise interaction scenarios: merges, lane changes, unprotected turns, intersection left turns, intersection right turns, pedestrian-vehicle interactions, cyclist vehicle interactions, interactions with close proximity, and interactions with high accelerations.
  • 32. Large Scale Interactive Motion Forecasting For Autonomous Driving : The WAYMO OPEN MOTION DATASET Diagram of baseline architecture. An illustration of the baseline architecture employed for the family of learned models with a base LSTM encoder for agent states. The three detachable components are a road graph polyline encoder, a traffic state LSTM encoder, and a high-order interactions encoder following. The trajectories are predicted through a MLP with min-of-k loss.
  • 33. Large Scale Interactive Motion Forecasting For Autonomous Driving : The WAYMO OPEN MOTION DATASET • First, consider a constant velocity model in which we assume the agent will maintain its velocity at the current timestamp for all future steps. • Second, consider a family of deep-learned models using various encoders, with a base architecture of a LSTM to encode a 1-second history of observed state; this includes agents’ positions, velocity, and 3d bounding boxes. • In order to measure the importance of particular additional features, selectively provide additional information: • Road graph (rg): encode the 3D map information with polylines following. • Traffic signals (ts): encode the traffic signal states with a LSTM encoder as an additional feature. • High-order interactions (hi): model the high-order interactions between agents with a global interaction graph following.
  • 34. Large Scale Interactive Motion Forecasting For Autonomous Driving : The WAYMO OPEN MOTION DATASET • Use conditional behavior prediction (CBP) to quantify the interactivity in our dataset. • A model can produce either unconditional predictions or predictions conditioned on a “query trajectory” for one of the agents in the scene. • If two agents are not interacting, then one’s actions have no effect on the other, so knowledge of that agent’s future should not change predictions for the other agent. • The degree of influence agent A has on agent B is defined as KL divergence between unconditional predictions for B and the predictions for B conditioned on a’s ground truth future trajectory. • Apply this to interactive and standard validation datasets, computing the KL divergence between unconditional and conditional predictions for every query agent/target agent pair in the dataset. • KL divergences are much larger in interactive validation dataset than in standard validation dataset.
  • 35. Large Scale Interactive Motion Forecasting For Autonomous Driving : The WAYMO OPEN MOTION DATASET The dataset contains many agents including pedestrians and cyclists. Top: 46% of scenes have more than 32 agents, and 11% of scenes have more than 64 agents. Bottom: In the standard validation set, 33.5% of scenes require at least one pedestrian to be predicted, and 10.4% of scenes require at least one cyclist to be predicted.
  • 36. Large Scale Interactive Motion Forecasting For Autonomous Driving : The WAYMO OPEN MOTION DATASET Agents selected to be predicted have diverse trajectories. Left: Ground truth trajectory of each predicted agent in a frame of reference where all agents start at the origin with heading pointing along the positive X axis (pointing up). Right: Distribution of maximum speeds achieved by all of the agents along their 9 second trajectory. Plots depict variety in trajectory shapes and speed profiles.
  • 37. Identifying Driver Interactions Via Conditional Behavior Prediction • Interactive driving scenarios, such as lane changes, merges and unprotected turns, are some of the most challenging situations for autonomous driving. • Planning in interactive scenarios requires accurately modeling the reactions of other agents to different future actions of the ego agent. • It develops end-to-end models for conditional behavior prediction (CBP) that take as an input a query future trajectory for an ego-agent, and predict distributions over future trajectories for other agents conditioned on the query. • Leveraging such a model, develop a general-purpose agent interactivity score derived from probabilistic first principles. • The interactivity score allows to find interesting interactive scenarios for training and evaluating behavior prediction models.
  • 38. Identifying Driver Interactions Via Conditional Behavior Prediction • Define an agent trajectory S as a fixed-length, time discretized sequence of agent states up to a finite time horizon. • All quantities in this work consider a pair of agents A and B. • Without loss of generality, consider A to be the query agent whose plan for the future can potentially affect B, the target agent. • The future trajectories of A and B are random variables SA and SB. • The marginal probability of a particular realization of agent b’s trajectory sb is given by p(SB = sb), also indicated by the shorthand p(sb). • The conditional distribution of agent b’s future trajectory given a realization of agent a’s trajectory sa is given by p(SB = sb|SA = sa), indicated by the shorthand p(sb|sa).
  • 39. Identifying Driver Interactions Via Conditional Behavior Prediction • Quantify interactions by estimating the change in log likelihood of the target’s ground-truth future sb • A large change in the log-likelihood indicates a situation in which the likelihood of the target agent’s trajectory changes significantly as a result of the query agent’s action. • Use the kl-divergence between the conditional and marginal distributions for the target’s predicted future trajectory SB to quantify the degree of influence exerted on B by a a trajectory sa: • Mutual information between the two agents’ future trajectories SA and SB is computed as • The interactivity score between agents A and B.
  • 40. Identifying Driver Interactions Via Conditional Behavior Prediction • A CBP model predicts p(SB|SA= sa, x), the distribution of future trajectories for B conditioned on sa. • Gaussian uncertainty over the positions of the trajectory waypoints as • Gaussian mixture model (GMM) with mixture weights fixed over all time steps of the same trajectory • The computation of the interactivity score also requires the estimation of marginal distributions
  • 41. Identifying Driver Interactions Via Conditional Behavior Prediction • Use the most likely 6 modes of the marginal distribution’s GMM as in standard motion forecasting metrics, rather than sampling N samples from the marginal distribution • Learn to predict distribution parameters via supervised learning with the negative log-likelihood loss • Encourage the model to maintain that agents cannot occupy the same future location in space-time, with a loss function
  • 42. Identifying Driver Interactions Via Conditional Behavior Prediction A conditional behavior prediction model describes how one agent’s predicted future trajectory can shift due to the actions of other agents. The architecture of the conditional behavior prediction model.
  • 43. Identifying Driver Interactions Via Conditional Behavior Prediction Histogram of interactivity score (mutual information) between 8,919,306 pairs of agents in the validation dataset.
  • 44. Identifying Driver Interactions Via Conditional Behavior Prediction Two examples of interacting agents found by sorting examples by mutual information and wADE. The marginal (left) and conditional predictions (right) are shown with the query in solid green, and predictions in dashed cyan lines.
  • 45. Identifying Driver Interactions Via Conditional Behavior Prediction An example in which the query and target agents slow down in parallel lanes as a result of a traffic light change. The marginal (left) and conditional predictions (right) are shown with the query in solid green.
  • 46. PEEKING INTO THE FUTURE: PREDICTING FUTURE PERSON ACTIVITIES AND LOCATIONS IN VIDEOS • Deciphering human behaviors to predict their future paths/trajectories and what they would do from videos is important in many applications. • Therfore, this work studies predicting a pedestrian’s future path jointly with future activities. • They propose an end-to-end, multi-task learning system, called next, utilizing rich visual features about human behavioral information and interaction with their surroundings. • It encodes a person through rich semantic features about visual appearance, body movement and interaction with the surroundings, motivated by the fact that humans derive such predictions by relying on similar visual cues. • To facilitate the training, the network is learned with an auxiliary task of predicting future location in which the activity will happen. • In the auxiliary task, it designs a discretized grid called the manhattan grid, as location prediction target for the system. https://github.com/JunweiLiang/social-distancing-prediction
  • 47. PEEKING INTO THE FUTURE: PREDICTING FUTURE PERSON ACTIVITIES AND LOCATIONS IN VIDEOS The goal is to jointly predict a person’s future path and activity. The green and yellow line show two possible future trajectories and two possible activities are shown in the green and yellow boxes. Depending on the future activity, the person (top right) may take different paths, e.g. the yellow path for “loading” and the green path for “object transfer”.
  • 48. PEEKING INTO THE FUTURE: PREDICTING FUTURE PERSON ACTIVITIES AND LOCATIONS IN VIDEOS • HUMANS NAVIGATE THROUGH PUBLIC SPACES OFTEN WITH SPECIFIC PURPOSES IN MIND, RANGING FROM SIMPLE ONES LIKE ENTERING A ROOM TO MORE COMPLICATED ONES LIKE PUTTING THINGS INTO A CAR. • SUCH INTENTION, HOWEVER, IS MOSTLY NEGLECTED IN EXISTING WORK. • THE JOINT PREDICTION MODEL CAN HAVE TWO BENEFITS: • 1) LEARNING THE ACTIVITY TOGETHER WITH THE PATH MAY BENEFIT THE FUTURE PATH PREDICTION; INTUITIVELY, HUMANS ARE ABLE TO READ FROM OTHERS’ BODY LANGUAGE TO ANTICIPATE WHETHER THEY ARE GOING TO CROSS THE STREET OR CONTINUE WALKING ALONG THE SIDEWALK. • 2) THE JOINT MODEL ADVANCES THE CAPABILITY OF UNDERSTANDING NOT ONLY THE FUTURE PATH BUT ALSO THE FUTURE ACTIVITY BY TAKING INTO ACCOUNT THE RICH SEMANTIC CONTEXT IN VIDEOS; THIS INCREASES THE CAPABILITIES OF AUTOMATED VIDEO ANALYTICS FOR SOCIAL GOOD, SUCH AS SAFETY APPLICATIONS LIKE ANTICIPATING PEDESTRIAN MOVEMENT AT TRAFFIC INTERSECTIONS OR A ROAD ROBOT HELPING HUMANS TRANSPORT GOODS TO A CAR.
  • 49. PEEKING INTO THE FUTURE: PREDICTING FUTURE PERSON ACTIVITIES AND LOCATIONS IN VIDEOS Overview of the Next model. Given a sequence of frames containing the person for prediction, this model utilizes person behavior module and person interaction module to encode rich visual semantics into a feature tensor.
  • 50. PEEKING INTO THE FUTURE: PREDICTING FUTURE PERSON ACTIVITIES AND LOCATIONS IN VIDEOS • 4 KEY COMPONENTS: • PERSON BEHAVIOR MODULE EXTRACTS VISUAL INFORMATION FROM THE BEHAVIORAL SEQUENCE OF THE PERSON. • PERSON INTERACTION MODULE LOOKS AT THE INTERACTION BETWEEN A PERSON AND THEIR SURROUNDINGS. • TRAJECTORY GENERATOR SUMMARIZES THE ENCODED VISUAL FEATURES AND PREDICTS THE FUTURE TRAJECTORY BY THE LSTM DECODER WITH FOCAL ATTENTION. • ACTIVITY PREDICTION UTILIZES RICH VISUAL SEMANTICS TO PREDICT THE FUTURE ACTIVITY LABEL FOR THE PERSON. • IN ADDITION, DIVIDE THE SCENE INTO A DISCRETIZED GRID OF MULTIPLE SCALES, CALLED MANHATTAN GRID, TO COMPUTE CLASSIFICATION AND REGRESSION FOR ROBUST ACTIVITY LOCATION PREDICTION.
  • 51. PEEKING INTO THE FUTURE: PREDICTING FUTURE PERSON ACTIVITIES AND LOCATIONS IN VIDEOS To model appearance changes of a person, utilize a pre-trained object detection model with “RoIAlign” to extract fixed size CNN features for each person bounding box. To average the features along the spatial dimensions for each person and feed them into an LSTM encoder. Finally, obtain a feature representation of Tobs × d, where d is the hidden size of the LSTM. To capture the body movement, utilize a person keypoint detection model to extract person keypoint information. To apply the linear transformation to embed the keypoint coordinates before feeding into the LSTM encoder. The shape of the encoded feature has the shape of Tobs × d. These appearance and movement features are commonly used in a wide variety of studies and thus do not introduce new concern on machine learning fairness.
  • 52. PEEKING INTO THE FUTURE: PREDICTING FUTURE PERSON ACTIVITIES AND LOCATIONS IN VIDEOS The person-objects feature can capture how far away the person is to the other person and the cars. The person-scene feature can capture whether the person is near the sidewalk or grass. It designs this information to the model with the hope of learning things like a person walks more often on the sidewalk than the grass and tends to avoid bumping into cars.
  • 53. PEEKING INTO THE FUTURE: PREDICTING FUTURE PERSON ACTIVITIES AND LOCATIONS IN VIDEOS • IT USES AN LSTM DECODER TO DIRECTLY PREDICT THE FUTURE TRAJECTORY IN THE XY-COORDINATE. • THE HIDDEN STATE OF THIS DECODER IS INITIALIZED USING THE LAST STATE OF THE PERSON’S TRAJECTORY LSTM ENCODER. • ADD AN AUXILIARY TASK, I.E. ACTIVITY LOCATION PREDICTION, IN ADDITION TO PREDICTING THE FUTURE ACTIVITY LABEL OF THE PERSON. • AT EACH TIME INSTANT, THE XY-COORDINATE WILL BE COMPUTED FROM THE DECODER STATE AND BY A FULLY CONNECTED LAYER. • IT EMPLOYS AN EFFECTIVE FOCAL ATTENTION, ORIGINALLY PROPOSED TO CARRY OUT MULTIMODAL INFERENCE OVER A SEQUENCE OF IMAGES FOR VISUAL QUESTION ANSWERING; WHICH KEY IDEA IS TO PROJECT MULTIPLE FEATURES INTO A SPACE OF CORRELATION, WHERE DISCRIMINATIVE FEATURES CAN BE EASIER TO CAPTURE BY THE ATTENTION MECHANISM.
  • 54. PEEKING INTO THE FUTURE: PREDICTING FUTURE PERSON ACTIVITIES AND LOCATIONS IN VIDEOS To bridge the gap between trajectory generation and activity label prediction, it proposes an activity location prediction (ALP) module to predict the final location of where the person will engage in the future activity. The activity location prediction includes two tasks, location classification and location regression.
  • 55. PEEKING INTO THE FUTURE: PREDICTING FUTURE PERSON ACTIVITIES AND LOCATIONS IN VIDEOS Qualitative comparison between this method and the baselines. Yellow path is the observable trajectory and green path is the ground truth trajectory during the prediction period. Predictions are shown as blue heatmaps.
  • 56. STINET: SPATIO-TEMPORAL-INTERACTIVE NETWORK FOR PEDESTRIAN DETECTION AND TRAJECTORY PREDICTION • DETECTING PEDESTRIANS AND PREDICTING FUTURE TRAJECTORIES FOR THEM ARE CRITICAL TASKS FOR NUMEROUS APPLICATIONS, SUCH AS AUTONOMOUS DRIVING. • PREVIOUS METHODS EITHER TREAT THE DETECTION AND PREDICTION AS SEPARATE TASKS OR SIMPLY ADD A TRAJECTORY REGRESSION HEAD ON TOP OF A DETECTOR. • AN END-TO-END TWO-STAGE NETWORK: SPATIO-TEMPORAL-INTERACTIVE NETWORK (STINET). • IN ADDITION TO 3D GEOMETRY MODELING OF PEDESTRIANS, MODEL THE TEMPORAL INFORMATION FOR EACH OF THE PEDESTRIANS. • IT PREDICTS BOTH CURRENT AND PAST LOCATIONS IN THE FIRST STAGE, SO THAT EACH PEDESTRIAN CAN BE LINKED ACROSS FRAMES AND THE COMPREHENSIVE SPATIO-TEMPORAL INFORMATION CAN BE CAPTURED IN THE SECOND STAGE. • ALSO, MODEL THE INTERACTION AMONG OBJECTS WITH AN INTERACTION GRAPH, TO GATHER THE INFORMATION AMONG THE NEIGHBORING OBJECTS. • COMPREHENSIVE EXPERIMENTS ON THE LYFT DATASET AND THE RECENTLY RELEASED LARGE-SCALE WAYMO OPEN DATASET FOR BOTH OBJECT DETECTION AND FUTURE TRAJECTORY PREDICTION.
  • 57. STINET: SPATIO-TEMPORAL-INTERACTIVE NETWORK FOR PEDESTRIAN DETECTION AND TRAJECTORY PREDICTION The overview. It takes a sequence of point clouds as input, detects pedestrians and predicts their future trajectories simultaneously. The point clouds are processed by Pillar Feature Encoding to generate Pillar Features. Then each Pillar Feature is fed into a backbone ResUNet to get backbone features. A Temporal Region Proposal Network (T-RPN) takes backbone features and generated temporal proposal with past and current boxes for each object. Spatio-Temporal-Interactive (STI) Feature Extractor learns features for each temporal proposal which are used for final detection and trajectory prediction.
  • 58. STINET: SPATIO-TEMPORAL-INTERACTIVE NETWORK FOR PEDESTRIAN DETECTION AND TRAJECTORY PREDICTION Backbone. Upper: overview of the backbone. The input point cloud sequence is fed to Voxelization and Point net to generate pseudo images, which are then processed by ResNet U-Net to generate final backbone feature sequence. Lower: detailed design of ResNet U-Net.
  • 59. STINET: SPATIO-TEMPORAL-INTERACTIVE NETWORK FOR PEDESTRIAN DETECTION AND TRAJECTORY PREDICTION Spatial-Temporal-Interactive Feature Extractor (STI- FE): Local geometry, local dynamic and history path features are extracted given a temporal proposal. For local geometry and local dynamics features, the yellow areas are used for feature extraction. Relational reasoning is performed across proposals’ local features to generate interactive features.
  • 60. STINET: SPATIO-TEMPORAL-INTERACTIVE NETWORK FOR PEDESTRIAN DETECTION AND TRAJECTORY PREDICTION