SlideShare a Scribd company logo
1 of 61
Download to read offline
Driving Behaviors for ADAS
and Autonomous Driving XIII
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles (10.10)
• Traject. Predict. for Auto. Driving based on Multi-Head Attention with Joint Agent-Map Representation (6.4)
• MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction (6.5)
• CoverNet: Multimodal behavior prediction using trajectory sets (CVPR.6.14)
• Motion Prediction using Trajectory Sets and Self-Driving Domain Knowledge (6.8)
• Learning Situational Driving (CVPR.6.14)
• AMENet: Attentive Maps Encoder Network for Trajectory Prediction (6.15)
• MCENET: Multi-Context Encoder Network for Homogeneous Agent Traj. Pred. in Mixed Traffic (6.23)
• Multi-Head Attention based Probabilistic Vehicle Trajectory Prediction (7.4)
• Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles (7.6)
• Traffic Agent Trajectory Prediction Using Social Convolution and Attention Mechanism (7.6)
• Planning on the fast lane: Learning to interact using attention mechanisms in path integral IRL (7.11)
• Vehicle Trajectory Prediction by Transfer Learning of Semi-Supervised Models (7.14)
Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
• The motion planners used in self-driving vehicles need to generate trajectories that are
safe, comfortable, and obey the traffic rules.
• This is usually achieved by two modules: behavior planner, which handles high-level
decisions and produces a coarse trajectory, and trajectory planner that generates a
smooth, feasible trajectory for the duration of the planning horizon.
• These planners, however, are typically developed separately, and changes in the behavior
planner might affect the trajectory planner in unexpected ways.
• Furthermore, the final trajectory outputted by the trajectory planner might differ
significantly from the one generated by the behavior planner, as they do not share the
same objective.
• Here it is a jointly learnable behavior and trajectory planner.
• Unlike most existing learnable motion planners that address either only behavior
planning, or use an uninterpretable neural network to represent the entire logic from
sensors to driving commands, this approach features an interpretable cost function on
top of perception, prediction and vehicle dynamics, and a joint learning algorithm that
learns a shared cost function employed by our behavior and trajectory components.
Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
The learnable motion planner has
discrete and continuous components,
minimizing the same cost function with
a same set of learned cost weights.
Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
A: Given a scenario, generate a set of possible SDV behaviors. B: Left and right lane boundaries
and the driving path that are relevant to the intended behavior are considered in the cost
function. C: SDV geometry for spatiotemporal overlapping cost are approximated using circles.
D: The SDV yields to pedestrians through stop lines on the driving paths.
Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
• Motion planners of modern self-driving cars are composed of two modules.
• The behavioral planner is responsible for making high level decisions.
• The trajectory planner takes the decision of the behavioral planner and a coarse trajectory and
produces a smooth trajectory for the duration of the planning horizon.
• Unfortunately these planners are typically developed separately, and changes in the behavioral
planner might affect, in unexpected ways, the trajectory planner.
• Furthermore, the trajectory outputted by the trajectory planner might differ in terms of behavior
from the one returned by the behavioral planner as they do not share the same objective.
• This motion planner comes as both behavioral and trajectory planners share the same objective.
• At each planning iteration, depending on the SDV location on the map, a subset of these
behaviors, denoted by B(W), is allowed by traffic-rules and hence considered for evaluation.
• It then generates low-level realizations of the high-level behaviors by generating a set of
trajectories T (b) relative to these paths.
Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
• A safe trajectory for the SDV should not only be collision- free, but also satisfy a safety-distance to
the surrounding obstacles, including both the static and dynamic objects such as vehicles,
pedestrians, cyclists, unknown objects, etc.
• It defines costs to capture the S-T overlap and violation of safety-distance respectively.
• For this, the SDV polygon is approximated by a set of circles with the same radii along the vehicle,
then using the distance from the center of the circles to the object polygon to evaluate the cost.
• The SDV is expected to adhere to the structure of the road. Therefore, introduce sub-costs that
measure such violations.
• The driving-path and boundaries that are considered for these sub-costs depend on the candidate
behavior.
• The driving-path cost is the squared distance towards the driving path and the lane boundary cost
is the squared violation distance of a safety threshold.
Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
Left: Headway cost penalizes unsafe distance to leading vehicles. Right:
for each sampled trajectory, a weight function determines how relevant
an obstacle is to the SDV in terms of its lateral offset.
Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
• As the SDV is driving behind a leading vehicle in either lane-following or lane-change behavior, it
should keep a safe longitudinal distance that depends on speed of SDV and the leading vehicle.
• Compute the headway cost as the violation of the safety distance after applying a comfortable
constant deceleration, assuming that the leading vehicle applies a hard brake and deciding which
vehicles are leading the SDV at each time-step in the planning horizon.
• Use a weight function of the lateral distance between the SDV and other vehicles to determine
how relevant they are for the headway cost.
• The distance violation costs incurred by vehicles that are laterally aligned with SDV dominate the
cost, compatible with lane change maneuvers where deciding the lead vehicles can be difficult.
• Pedestrians are vulnerable road users and hence require extra caution, defining a yield cost.
• The mission route is represented as a sequence of lanes, from which to specify all lanes that are
on the route or are connected to the route by permitted lane-changes.
• A cost-to-go function to capture the value of the final state, speed-limit of a lane for a cost that
penalizes a trajectory which exceeds the eligible speed and costs for comfortable driving.
Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
Behavioral decisions include obstacle side
assignment and lane information, which are sent
through the behavioral- trajectory interface.
Example trajectories in a nudging scenario
Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
The max-margin objective uses a surrogate loss to learn
the sub-cost weights, since selecting the optimal trajectory
within a discrete set is not differentiable. In contrast, the
iterative optimization in the trajectory planner is a
differentiable module, where gradients of the imitation loss
function can be computed using the backpropagation through
time (BPTT) algorithm. Since unrolling the full optimization can
be computationally expensive, unroll only for a truncated
number of steps after we obtain a solution. Perform M gradient
descent steps after obtaining the optimal trajectory, and
backpropagate through these M steps only. If the control
obtained from the continuous optimization converges to the
optimum, then backpropagating through a truncated number
of steps is approximating of the inverse Hessian at the optimum.
Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
•Behavioral with max-margin (“B+M”) learns the weight vector through the max-margin (+M) learning
on the behavioral planner only.
•Full Inference (“B+M +J”) uses the trained weights of “B+M”, and runs the joint inference algorithm
(+J) at test time.
•Full Learning & Inference (“B+M +J +I”) learns the weight vector using the combination of max-
margin (+M) and imitation objective (+I), and runs the joint inference algorithm (+J) at test time.
Traject. Predict. for Auto. Driving based on Multi-Head
Attention with Joint Agent-Map Representation
• Predicting the trajectories of surrounding agents is an essential ability for robots navigating
complex real-world environments.
• Autonomous vehicles (AV) in particular, can generate safe and efficient path plans by predicting
the motion of surrounding road users.
• Future trajectories of agents can be inferred using two tightly linked cues: the locations and past
motion of agents, and the static scene structure.
• The configuration of the agents may uncover which part of the scene is more relevant, while the
scene structure can determine the relative influence of agents on each others motion.
• To better model the interdependence of the two cues, a multi- head attention-based model that
uses a joint representation of the static scene and agent configuration for generating both keys
and values for the attention heads.
• To address the multimodality of future agent motion, use each attention head to generate a
distinct future trajectory of the agent.
• The visualization of attention maps adds a layer of interpretability to the trajectories predicted by
the model.
Traject. Predict. for Auto. Driving based on Multi-Head
Attention with Joint Agent-Map Representation
MHA-JAM (MHA with Joint Agent Map representation): Each LSTM encoder generates an encoding
vector of one of the surrounding agent recent motion. The CNN backbone transforms the input map
image to a 3D tensor of scene features. A combined representation of the context is built by
concatenating the surrounding agents motion encodings and the scene features. Each attention head
models a possible way of interaction between the target (green car) and the combined context features.
Each LSTM decoder receives a context vector and the target vehicle encoding and generates a possible
distribution over a possible predicted trajectory conditioned on each context.
Traject. Predict. for Auto. Driving based on Multi-Head
Attention with Joint Agent-Map Representation
Off-road loss: an auxiliary loss function that
penalizes locations predicted by the model the fall
outside the drivable area. It is proportional to the
distance of a predicted location from the nearest
point on the drivable area.
Regression loss: To not penalize plausible trajectories
generated by the model that do not correspond to
the ground truth, use a variant of the best-of-L
regression loss for training our model. Compute the
negative log-likelihood (NLL) of the ground truth
trajectory under each of the L modes output by the
model and consider the minimum of the L NLL values
as the regression loss.
Classification loss: In addition to the regression loss,
consider cross entropy.
Traject. Predict. for Auto. Driving based on Multi-Head
Attention with Joint Agent-Map Representation
Joint vs. separate agent-map representation for the
attention heads. two models: (1) a baseline where
attention weights are separately generated for the map
and agents features by generating keys and values for each
set of features independent of the other (a), (2) this
formulation where each attention head generates keys
and values based on a joint representation of agent and
map features (b).
Traject. Predict. for Auto. Driving based on Multi-Head
Attention with Joint Agent-Map Representation
MANTRA: Memory Augmented Networks for
Multiple Trajectory Prediction
• Autonomous vehicles are expected to drive in complex scenarios with several independent non
cooperating agents.
• Path planning for safely navigating in such environments can not just rely on perceiving present
location and motion of other agents.
• It requires instead to predict such variables in a far enough future: the problem of multimodal
trajectory prediction exploiting a Memory Augmented Neural Network.
• This method learns past and future trajectory embeddings using RNNs and exploits an associative
external memory to store and retrieve such embeddings.
• Trajectory prediction is then performed by decoding in-memory future encodings conditioned
with the observed past.
• It incorporates scene knowledge in the decoding state by learning a CNN on top of semantic
scene maps.
• Memory growth is limited by learning a writing controller based on the predictive capability of
existing embeddings.
• Thanks to the non-parametric nature of the memory module, the trained system can continuously
improve by ingesting novel patterns.
MANTRA: Memory Augmented Networks for
Multiple Trajectory Prediction
MANTRA addresses multimodal trajectory
prediction. Obtain multiple future predictions
given an observed past relying on a Memory
Augmented Neural Network.
MANTRA: Memory Augmented Networks for
Multiple Trajectory Prediction
Architecture of MANTRA. The encoding of an observed past trajectory is used as key to read
likely future encodings from memory. A multimodal prediction is obtained by decoding each
future encoding, conditioned by the observed past. The surrounding context is processed
by a CNN and fed to the Refinement Module to adjust predictions.
MANTRA: Memory Augmented Networks for
Multiple Trajectory Prediction
Representation learning: The encoders learn to map past and future points into a
meaningful representation and the decoder learns to reproduce the future. Instead of
using just the future as input, condition the reconstruction process also with an
encoding of the past. Past and future trajectories are encoded separately; a decoder
reconstructs future trajectory only.
MANTRA: Memory Augmented Networks for
Multiple Trajectory Prediction
CoverNet: Multimodal Behavior Prediction
using Trajectory Sets
• CoverNet, a new method for multimodal, probabilistic trajectory prediction for
urban driving.
• Previous work has employed a variety of methods, including multimodal
regression, occupancy maps, and 1-step stochastic policies.
• It frames the trajectory prediction problem as classification over a diverse set of
trajectories.
• The size of this set remains manageable due to the limited number of distinct
actions that can be taken over a reasonable prediction horizon.
• It structures the trajectory set to a) ensure a desired level of coverage of the state
space, and b) eliminate physically impossible trajectories.
• By dynamically generating trajectory sets based on the agent’s current state, they
further improve the method’s efficiency.
CoverNet: Multimodal Behavior Prediction
using Trajectory Sets
CoverNet overview following MTP
CoverNet: Multimodal Behavior Prediction
using Trajectory Sets
Motion Prediction using Trajectory Sets and Self-
Driving Domain Knowledge
• Predicting the future motion of vehicles has been studied using various
techniques, including stochastic policies, generative models, and regression.
• Recent work has shown that classification over a trajectory set, which
approximates possible motions, achieves state-of-the-art performance and avoids
issues like mode collapse.
• However, map information and the physical relationships between nearby
trajectories is not fully exploited in this formulation.
• Build on classification-based approaches to motion prediction by adding an
auxiliary loss that penalizes off-road predictions.
• This auxiliary loss can easily be pretrained using only map information (e.g., off-
road area), which significantly improves performance on small datasets.
• Weighted cross-entropy losses to capture spatial-temporal relationships among
trajectories.
Motion Prediction using Trajectory Sets and Self-
Driving Domain Knowledge
Visualization of on-road (black) and
off-road (red) trajectories
Visualization of the target distribution in the
standard cross-entropy formulation (left), and
the weighted cross-entropy loss (right)
Motion Prediction using Trajectory Sets and Self-
Driving Domain Knowledge
Results listed as Argoverse | nuScenes
Learning Situational Driving
• Human drivers have a remarkable ability to drive in diverse visual conditions and
situations, e.g., from maneuvering in rainy, limited visibility conditions with no lane
markings to turning in a busy intersection while yielding to pedestrians.
• In contrast, state-of-the-art sensorimotor driving models struggle when encountering
diverse settings with varying relationships between observation and action.
• To generalize when making decisions across diverse conditions, humans leverage multiple
types of situation- specific reasoning and learning strategies.
• Motivated by this observation, a framework for learning a situational driving policy that
effectively captures reasoning under varying types of scenarios.
• The key idea is to learn a mixture model with a set of policies to capture multiple driving
modes.
• First optimize the mixture model through behavior cloning.
• Then refine the model by directly optimizing for the driving task itself, i.e., supervised with
the navigation task reward.
• It is more scalable than methods assuming access to privileged information, e.g.,
perception labels, as it only assumes demonstration and reward-based super- vision.
Learning Situational Driving
Situational Driving. To address the complexity in
learning perception-to-action driving models, we
introduce a situational framework using a behavior
module. The module reasons over current on-road
scene context when composing a set of learned
behavior policies under varying driving scenarios.
Our approach is used to improve over behavior
reflex and privileged approaches in terms of
robustness and scalability.
Learning Situational Driving
Approach Overview. The agent learns to
combine a set of expert policies in a context-
dependent, task- optimized manner to robustly
drive in diverse scenarios.
Learning Situational Driving
AMENet: Attentive Maps Encoder Network for
Trajectory Prediction
• Trajectory prediction is a crucial task in different communities, such as intelligent
transportation systems, computer vision, and mobile robot applications.
• However, there are many challenges to predict the trajectories of heterogeneous road
agents (e.g., pedestrians, cyclists and vehicles) at a microscopical level.
• For example, an agent might be able to choose multiple plausible paths in complex
interactions with other agents in varying environments, and the behavior of each agent is
affected by the various behaviors of its neighboring agents.
• To this end, an end-to-end generative model named Attentive Maps Encoder Network
(AMENet) for accurate and realistic multi-path trajectory prediction.
• It leverages the target road user’s motion information (i.e., movement in xy-axis in a
Cartesian space) and the interaction information with the neighboring road users at each
time step, which is encoded as dynamic maps that are centralized on the target road user.
• A conditional variational auto-encoder module is trained to learn the latent space of
possible future paths based on the dynamic maps and then used to predict multiple
plausible future trajectories conditioned on the observed past trajectories.
AMENet: Attentive Maps Encoder Network for
Trajectory Prediction
An overview of the proposed framework. It consists of four modules: the X-Encoder and Y-Encoder are
used for encoding the observed and the future trajectories, respectively. They have a similar structure. The
Sample Generator produces diverse samples of future generations. The Decoder module is used to decode
the features from the produced samples in the last step and predicts the future trajectory sequentially
AMENet: Attentive Maps Encoder Network for
Trajectory Prediction
Structure of the X-Encoder. The encoder has
two branches: the upper one is used to
extract motion information of target agents,
and the lower one is used to learn the
interaction information among the
neighboring road users from dynamic maps
over time. Each dynamic map consists of 3
layers that represents orientation, travel
speed and relative position, which are
centralized on the target road user
respectively. The motion information and
the interaction information are encoded by
their own LSTM sequentially. The last
outputs of the two LSTMs are concatenated
and forwarded to a fc layer to get the final
output of the X-Encoder.
The Y-Encoder has the same structure as the X-Encoder but
it is used for extracting features from the future trajectories
and only used in the training phase.
AMENet: Attentive Maps Encoder Network for
Trajectory Prediction
MCENET: Multi-Context Encoder Network for
Homogeneous Agent Traj. Pred. in Mixed Traffic
• Trajectory prediction in urban mixed-traffic zones (a.k.a. shared spaces) is critical for
many intelligent transportation systems, such as intent detection for autonomous driving.
• However, there are many challenges to predict the trajectories of heterogeneous road
agents (pedestrians, cyclists and vehicles) at a microscopical level.
• For example, an agent might be able to choose multiple plausible paths in complex
interactions with other agents in varying environments.
• Multi-Context Encoder Network (MCENET) is trained by encoding both past and future
scene context, interaction context and motion information to capture the patterns and
variations of the future trajectories using a set of stochastic latent variables.
• In inference time, combine the past context and motion info of the target agent with
samplings of the latent variables to predict multiple realistic trajectories in the future.
• Through experiments on several datasets of varying scenes, it outperforms some of the
recent state-of-the-art methods for mixed traffic trajectory prediction by a large margin
and more robust in a very challenging environment.
MCENET: Multi-Context Encoder Network for
Homogeneous Agent Traj. Pred. in Mixed Traffic
Predicting the future trajectory (d) by observing the past trajectories (c) considering the scene (a)
and grouping context (b). Three kinds of scene context: (1) aerial photograph provides overview
of the environment, (2) segmented map defines the accessible areas respective to road agents’
transport mode and (3) the motion heat map describes the prior of how different agents move.
Different colors denote different agents or agent groups.
MCENET: Multi-Context Encoder Network for
Homogeneous Agent Traj. Pred. in Mixed Traffic
The pipeline for the method. The ground truth Y and the associated interaction and scene context are
injected to the input only in training. They are not available in inference. The latent variables are sampled N
times and concatenated with the output of X-Encoder for predicting multiple future paths.
MCENET: Multi-Context Encoder Network for
Homogeneous Agent Traj. Pred. in Mixed Traffic
Multi-Head Attention based Probabilistic Vehicle
Trajectory Prediction
• Online-capable deep learning model for probabilistic vehicle trajectory prediction.
• A simple encoder-decoder architecture based on multi- head attention.
• It generates the distribution of the predicted trajectories for multiple vehicles in parallel.
• It models the interactions by learning to attend to a few influential vehicles in an
unsupervised manner, which can improve the interpretability of the network.
• Interpretability: The use of multi-head attention improves the interpretability of the
neural network because the model can learn the social relations of neighboring vehicles
in an unsupervised manner.
• Scalability: As the output dimension of multi-head attention is flexible to the number of
the vehicles, the network can be extended to very dense traffic scenarios. The network is
tested in an autonomous vehicle platform with surrounding vehicles less than 30. The
average computation time is 50ms.
• Accuracy: The method is verified by using the naturalistic trajectory data in highway, and
the better performance than the existing methods in terms of positional error.
Multi-Head Attention based Probabilistic Vehicle
Trajectory Prediction
The road on the left denotes the input of the prediction model, which consists of the past
trajectories of surrounding vehicles, X, and the lane information, I. The road on the right denotes
the output of the prediction model, which is the distribution of the future trajectories, P(Y|X,I).
Multi-Head Attention based Probabilistic Vehicle
Trajectory Prediction
Structure of the attention layer for
both the lane and the vehicles.
Probabilistic Multi-modal Trajectory Prediction
with Lane Attention for Autonomous Vehicles
• Trajectory prediction is crucial for autonomous vehicles.
• The planning system not only needs to know the current state of the surrounding objects but also
their possible states in the future.
• As for vehicles, their trajectories are significantly influenced by the lane geometry and how to
effectively use the lane information is of active interest.
• Most of the existing works use rasterized maps to explore road information, which does not
distinguish different lanes.
• It is an instance-aware representation for lane representation.
• By integrating the lane features and trajectory features, a goal-oriented lane attention module is
proposed to predict the future locations of the vehicle.
• The lane representation together with the lane attention module can be integrated into the
widely used encoder-decoder framework to generate diverse predictions.
• Most importantly, each generated trajectory is associated with a probability to handle the
uncertainty.
• It does not suffer from collapsing to one behavior modal and can cover diverse possibilities.
Probabilistic Multi-modal Trajectory Prediction
with Lane Attention for Autonomous Vehicles
An overview of this method. The model consists of a trajectory encoder, a lane encoder,
an interaction network, a lane attention module and a final trajectory decoder.
Probabilistic Multi-modal Trajectory Prediction
with Lane Attention for Autonomous Vehicles
(a) An example of selected lanes. The blue dot represents the last location of the target vehicle. “s”
and “e” denotes the start and end of a road segment respectively. (b) The architecture of Lane Encoder.
“conv 1, 64” means 1D convolution with kernel size of 1 and 64 output channels. The final output is a
128-d vector for each lane. (c) The structure of proposed lane attention module.
Probabilistic Multi-modal Trajectory Prediction
with Lane Attention for Autonomous Vehicles
Traffic Agent Trajectory Prediction Using
Social Convolution and Attention Mechanism
• The trajectory prediction is significant for the decision-making of autonomous driving vehicles.
• This paper proposes a model to predict the trajectories of target agents around an autonomous
vehicle.
• The main idea is considering the history trajectories of the target agent and the influence of
surrounding agents on the target agent.
• It encodes the target agent history trajectories as an attention mask and constructs a social map
to encode the interactive relationship between the target agent and its surrounding agents.
• Given a trajectory sequence, the LSTM networks are firstly utilized to extract the features for all
agents, based on which the attention mask and social map are formed.
• Then, the attention mask and social map are fused to get the fusion feature map, which is
processed by the social convolution to obtain a fusion feature representation.
• Finally, this fusion feature is taken as the input of a variable-length LSTM to predict the trajectory
of the target agent.
• The variable-length LSTM enables the model to handle the case that the number of agents in the
sensing scope is highly dynamic in traffic scenes.
Traffic Agent Trajectory Prediction Using
Social Convolution and Attention Mechanism
The target agent is marked by the grey
square. The blue grid region around it is its
grid cell. It generates input representation for
all agents based on trajectory information.
These representation are passed through
LSTMs and eventually used to construct the
social map, the target agent’s representation
is encoded as the attention mask. The
production of attention mask and social map
is passed through ConvNets and then
concatenated together with the target agent
tensor to produce latent representation.
Finally, this latent representation are passed
through an LSTM to generate a trajectory
prediction for the target agent.
Traffic Agent Trajectory Prediction Using
Social Convolution and Attention Mechanism
The Results For Trajectory Prediction On BLVD Dataset
The Results Of Different Combination Models
Planning on the fast lane: Learning to interact using
attention mechanisms in path integral inverse RL
• General-purpose trajectory planning algorithms for automated driving utilize complex reward
functions to perform a combined optimization of strategic, behavioral, and kinematic features.
• The specification and tuning of a single reward function is a tedious task and does not generalize
over a large set of traffic situations.
• Deep learning approaches based on path integral inverse reinforcement learning have been
successfully applied to predict local situation-dependent reward functions using features of a set
of sampled driving policies.
• Sample-based trajectory planning algorithms are able to approximate a spatio-temporal subspace
of feasible driving policies that can be used to encode the context of a situation.
• However, the interaction with dynamic objects requires an extended planning horizon, which
requires sequential context modeling.
• This work cares the sequential reward prediction over an extended time horizon.
• A neural network architecture that uses a policy attention mechanism to generate a low-
dimensional context vector by concentrating on trajectories with a human-like driving style.
• Besides, a temporal attention mechanism to identify context switches and allow for stable
adaptation of rewards.
Planning on the fast lane: Learning to interact using
attention mechanisms in path integral inverse RL
Illustration of planner for automated driving, which samples
policies for our deep inverse RL approach. The z-axis corresponds
to the velocity, whereas the ground plane depicts spatial feature
maps such as distances from the lane centers. A subset of policies
is visualized, where the green triangle shows the optimal policy
and the blue triangles high-light the highest policy attention. The
color gradient corresponds to the policy value. Blue policies have
a high attention activation. The cylindric objects represent a stop
barrier.
Planning on the fast lane: Learning to interact using
attention mechanisms in path integral inverse RL
Neural network architectures for situation-dependent reward prediction. Policy temporal attention architecture
consisting of policy attention and temporal attention mechanism. Inputs are a set of planning cycles each having a
set of policies. Policy encoder generates a latent representation of individual policies. Policy attention mechanism
produces a low-dimensional context vector, which is forwarded to the temporal attention network (TAN). Policy
temporal attention mechanism predicts a mixture reward function given a history of context vectors.
Planning on the fast lane: Learning to interact using
attention mechanisms in path integral inverse RL
Overview of average test performance based on expected value difference (EVD), expected distance
(ED), and optimal policy distance (OPD). Tests are conducted on a test dataset, recorded by an expert-
tuned planning algorithm.
Vehicle Trajectory Prediction by Transfer
Learning of Semi-Supervised Models
• This work shows that semi-supervised models for vehicle trajectory
prediction significantly improve performance over supervised models on
state-of-the-art real-world benchmarks.
• Moving from supervised to semi-supervised models allows scaling-up by
using unlabeled data, increasing the number of images in pre-training from
Millions to a Billion.
• It performs ablation studies comparing transfer learning of semi-supervised
and supervised models while keeping all other factors equal.
• Within semi-supervised models it compares contrastive learning with
teacher-student methods as well as networks predicting a small number of
trajectories with networks predicting probabilities over a large trajectory
set.
Vehicle Trajectory Prediction by Transfer
Learning of Semi-Supervised Models
An example of input and output representations for
mid-level (top) and low-level representations
(bottom). In the top row, the mid-level input
representation is an annotated map of the scene
(top left), with boxes representing agent positions
and colors representing semantic categories. The
output (top right) is a probability distribution over a
set of candidate trajectories. In the bottom row, a
low-level representation uses an image from the
vehicle’s front-facing camera as input (bottom left),
and predicts the future steering wheel angle (bottom
right) and speed of the vehicle.
Vehicle Trajectory Prediction by Transfer
Learning of Semi-Supervised Models
• Mid-level representation: an annotated map image to represent the driving
environment. This includes annotations for drivable areas, crosswalks and
walkways using color coding to represent semantic categories. All scenes are
oriented such that the agent under consideration is centered and directed
towards the top of the image. The positions of all agents in the scene are drawn
onto the image, using faded bounding boxes to represent past positions in a
historical window. By encoding all this information into a single map, a large
amount of information is condensed into a single image.
• Low-level representation: use front-facing camera images from the Drive360
dataset as a low-level representation of a driving environment. In addition to the
image, it includes a vector of semantic map data, which includes datapoints such
as the distance to the nearest intersection, the speed limit, and the approximate
road curvature.
Vehicle Trajectory Prediction by Transfer
Learning of Semi-Supervised Models
low-level representations
mid-level representations
Vehicle Trajectory Prediction by Transfer
Learning of Semi-Supervised Models
Comparison of semi-supervised models used in experiments. the labeled
dataset in all the models consists of 1.2m images. since SimCLR is trained
on augmentations, there is no measure of unlabeled data set size.
Vehicle Trajectory Prediction by Transfer
Learning of Semi-Supervised Models
results of CoverNet and MTP on the NuScenes dataset, comparing different semi-supervised and
supervised models to encode the annotated map. For each semi-supervised model, a direct
comparison to a supervised model with the same architecture. semi-supervised models
significantly outperform their supervised counterparts on most metrics.
Driving behaviors for adas and autonomous driving XIII

More Related Content

What's hot

Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Driving Behavior for ADAS and Autonomous Driving IX
Driving Behavior for ADAS and Autonomous Driving IXDriving Behavior for ADAS and Autonomous Driving IX
Driving Behavior for ADAS and Autonomous Driving IXYu Huang
 
Driving Behavior for ADAS and Autonomous Driving V
Driving Behavior for ADAS and Autonomous Driving VDriving Behavior for ADAS and Autonomous Driving V
Driving Behavior for ADAS and Autonomous Driving VYu Huang
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIYu Huang
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIYu Huang
 
Driving behavior for ADAS and Autonomous Driving
Driving behavior for ADAS and Autonomous DrivingDriving behavior for ADAS and Autonomous Driving
Driving behavior for ADAS and Autonomous DrivingYu Huang
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic SegmentationYu Huang
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivYu Huang
 
Driving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIDriving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIYu Huang
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous drivingYu Huang
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VYu Huang
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IVYu Huang
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learningYu Huang
 

What's hot (20)

Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Driving Behavior for ADAS and Autonomous Driving IX
Driving Behavior for ADAS and Autonomous Driving IXDriving Behavior for ADAS and Autonomous Driving IX
Driving Behavior for ADAS and Autonomous Driving IX
 
Driving Behavior for ADAS and Autonomous Driving V
Driving Behavior for ADAS and Autonomous Driving VDriving Behavior for ADAS and Autonomous Driving V
Driving Behavior for ADAS and Autonomous Driving V
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving III
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
 
Driving behavior for ADAS and Autonomous Driving
Driving behavior for ADAS and Autonomous DrivingDriving behavior for ADAS and Autonomous Driving
Driving behavior for ADAS and Autonomous Driving
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xiv
 
Driving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIDriving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VI
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VII
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IV
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learning
 

Similar to Driving behaviors for adas and autonomous driving XIII

Driving Behavior for ADAS and Autonomous Driving II
Driving Behavior for ADAS and Autonomous Driving IIDriving Behavior for ADAS and Autonomous Driving II
Driving Behavior for ADAS and Autonomous Driving IIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving IV
Driving Behavior for ADAS and Autonomous Driving IVDriving Behavior for ADAS and Autonomous Driving IV
Driving Behavior for ADAS and Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Modal split analysis
Modal split analysis Modal split analysis
Modal split analysis ashahit
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigationguest90654fd
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigationguest90654fd
 
LANE CHANGE DETECTION AND TRACKING FOR A SAFE-LANE APPROACH IN REAL TIME VISI...
LANE CHANGE DETECTION AND TRACKING FOR A SAFE-LANE APPROACH IN REAL TIME VISI...LANE CHANGE DETECTION AND TRACKING FOR A SAFE-LANE APPROACH IN REAL TIME VISI...
LANE CHANGE DETECTION AND TRACKING FOR A SAFE-LANE APPROACH IN REAL TIME VISI...cscpconf
 
Traffic assignment
Traffic assignmentTraffic assignment
Traffic assignmentMNIT,JAIPUR
 
Description of the project thesis at Fraunhofer ITWM
Description of the project thesis at Fraunhofer ITWMDescription of the project thesis at Fraunhofer ITWM
Description of the project thesis at Fraunhofer ITWMAditya Mahesh
 
_370996_1_En_1_Chapter_Author
_370996_1_En_1_Chapter_Author_370996_1_En_1_Chapter_Author
_370996_1_En_1_Chapter_AuthorNahid Mahbub
 
Online/Offline Lane Change Events Detection Algorithms
Online/Offline Lane Change Events Detection AlgorithmsOnline/Offline Lane Change Events Detection Algorithms
Online/Offline Lane Change Events Detection AlgorithmsFeras Tanan
 
IRJET-To Analyze Calibration of Car-Following Behavior of Vehicles
IRJET-To Analyze Calibration of Car-Following Behavior of VehiclesIRJET-To Analyze Calibration of Car-Following Behavior of Vehicles
IRJET-To Analyze Calibration of Car-Following Behavior of VehiclesIRJET Journal
 
Cloudsim t-drive enhancing driving directions with taxi drivers’ intelligence
Cloudsim  t-drive enhancing driving directions with taxi drivers’ intelligenceCloudsim  t-drive enhancing driving directions with taxi drivers’ intelligence
Cloudsim t-drive enhancing driving directions with taxi drivers’ intelligenceecway
 
Java t-drive enhancing driving directions with taxi drivers’ intelligence
Java  t-drive enhancing driving directions with taxi drivers’ intelligenceJava  t-drive enhancing driving directions with taxi drivers’ intelligence
Java t-drive enhancing driving directions with taxi drivers’ intelligenceecwayerode
 
T drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceT drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceEcway Technologies
 
Java t-drive enhancing driving directions with taxi drivers’ intelligence
Java  t-drive enhancing driving directions with taxi drivers’ intelligenceJava  t-drive enhancing driving directions with taxi drivers’ intelligence
Java t-drive enhancing driving directions with taxi drivers’ intelligenceEcway Technologies
 
Android t-drive enhancing driving directions with taxi drivers’ intelligence
Android  t-drive enhancing driving directions with taxi drivers’ intelligenceAndroid  t-drive enhancing driving directions with taxi drivers’ intelligence
Android t-drive enhancing driving directions with taxi drivers’ intelligenceecway
 
T drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceT drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceEcway Technologies
 

Similar to Driving behaviors for adas and autonomous driving XIII (20)

Driving Behavior for ADAS and Autonomous Driving II
Driving Behavior for ADAS and Autonomous Driving IIDriving Behavior for ADAS and Autonomous Driving II
Driving Behavior for ADAS and Autonomous Driving II
 
Driving Behavior for ADAS and Autonomous Driving IV
Driving Behavior for ADAS and Autonomous Driving IVDriving Behavior for ADAS and Autonomous Driving IV
Driving Behavior for ADAS and Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Hcs
HcsHcs
Hcs
 
FinalReport
FinalReportFinalReport
FinalReport
 
Modal split analysis
Modal split analysis Modal split analysis
Modal split analysis
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
 
LANE CHANGE DETECTION AND TRACKING FOR A SAFE-LANE APPROACH IN REAL TIME VISI...
LANE CHANGE DETECTION AND TRACKING FOR A SAFE-LANE APPROACH IN REAL TIME VISI...LANE CHANGE DETECTION AND TRACKING FOR A SAFE-LANE APPROACH IN REAL TIME VISI...
LANE CHANGE DETECTION AND TRACKING FOR A SAFE-LANE APPROACH IN REAL TIME VISI...
 
Traffic assignment
Traffic assignmentTraffic assignment
Traffic assignment
 
Description of the project thesis at Fraunhofer ITWM
Description of the project thesis at Fraunhofer ITWMDescription of the project thesis at Fraunhofer ITWM
Description of the project thesis at Fraunhofer ITWM
 
_370996_1_En_1_Chapter_Author
_370996_1_En_1_Chapter_Author_370996_1_En_1_Chapter_Author
_370996_1_En_1_Chapter_Author
 
Online/Offline Lane Change Events Detection Algorithms
Online/Offline Lane Change Events Detection AlgorithmsOnline/Offline Lane Change Events Detection Algorithms
Online/Offline Lane Change Events Detection Algorithms
 
IRJET-To Analyze Calibration of Car-Following Behavior of Vehicles
IRJET-To Analyze Calibration of Car-Following Behavior of VehiclesIRJET-To Analyze Calibration of Car-Following Behavior of Vehicles
IRJET-To Analyze Calibration of Car-Following Behavior of Vehicles
 
Cloudsim t-drive enhancing driving directions with taxi drivers’ intelligence
Cloudsim  t-drive enhancing driving directions with taxi drivers’ intelligenceCloudsim  t-drive enhancing driving directions with taxi drivers’ intelligence
Cloudsim t-drive enhancing driving directions with taxi drivers’ intelligence
 
Java t-drive enhancing driving directions with taxi drivers’ intelligence
Java  t-drive enhancing driving directions with taxi drivers’ intelligenceJava  t-drive enhancing driving directions with taxi drivers’ intelligence
Java t-drive enhancing driving directions with taxi drivers’ intelligence
 
T drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceT drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligence
 
Java t-drive enhancing driving directions with taxi drivers’ intelligence
Java  t-drive enhancing driving directions with taxi drivers’ intelligenceJava  t-drive enhancing driving directions with taxi drivers’ intelligence
Java t-drive enhancing driving directions with taxi drivers’ intelligence
 
Android t-drive enhancing driving directions with taxi drivers’ intelligence
Android  t-drive enhancing driving directions with taxi drivers’ intelligenceAndroid  t-drive enhancing driving directions with taxi drivers’ intelligence
Android t-drive enhancing driving directions with taxi drivers’ intelligence
 
T drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceT drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligence
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksYu Huang
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image VYu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 

Recently uploaded

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 

Recently uploaded (20)

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 

Driving behaviors for adas and autonomous driving XIII

  • 1. Driving Behaviors for ADAS and Autonomous Driving XIII Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles (10.10) • Traject. Predict. for Auto. Driving based on Multi-Head Attention with Joint Agent-Map Representation (6.4) • MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction (6.5) • CoverNet: Multimodal behavior prediction using trajectory sets (CVPR.6.14) • Motion Prediction using Trajectory Sets and Self-Driving Domain Knowledge (6.8) • Learning Situational Driving (CVPR.6.14) • AMENet: Attentive Maps Encoder Network for Trajectory Prediction (6.15) • MCENET: Multi-Context Encoder Network for Homogeneous Agent Traj. Pred. in Mixed Traffic (6.23) • Multi-Head Attention based Probabilistic Vehicle Trajectory Prediction (7.4) • Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles (7.6) • Traffic Agent Trajectory Prediction Using Social Convolution and Attention Mechanism (7.6) • Planning on the fast lane: Learning to interact using attention mechanisms in path integral IRL (7.11) • Vehicle Trajectory Prediction by Transfer Learning of Semi-Supervised Models (7.14)
  • 3. Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles • The motion planners used in self-driving vehicles need to generate trajectories that are safe, comfortable, and obey the traffic rules. • This is usually achieved by two modules: behavior planner, which handles high-level decisions and produces a coarse trajectory, and trajectory planner that generates a smooth, feasible trajectory for the duration of the planning horizon. • These planners, however, are typically developed separately, and changes in the behavior planner might affect the trajectory planner in unexpected ways. • Furthermore, the final trajectory outputted by the trajectory planner might differ significantly from the one generated by the behavior planner, as they do not share the same objective. • Here it is a jointly learnable behavior and trajectory planner. • Unlike most existing learnable motion planners that address either only behavior planning, or use an uninterpretable neural network to represent the entire logic from sensors to driving commands, this approach features an interpretable cost function on top of perception, prediction and vehicle dynamics, and a joint learning algorithm that learns a shared cost function employed by our behavior and trajectory components.
  • 4. Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles The learnable motion planner has discrete and continuous components, minimizing the same cost function with a same set of learned cost weights.
  • 5. Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles A: Given a scenario, generate a set of possible SDV behaviors. B: Left and right lane boundaries and the driving path that are relevant to the intended behavior are considered in the cost function. C: SDV geometry for spatiotemporal overlapping cost are approximated using circles. D: The SDV yields to pedestrians through stop lines on the driving paths.
  • 6. Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles • Motion planners of modern self-driving cars are composed of two modules. • The behavioral planner is responsible for making high level decisions. • The trajectory planner takes the decision of the behavioral planner and a coarse trajectory and produces a smooth trajectory for the duration of the planning horizon. • Unfortunately these planners are typically developed separately, and changes in the behavioral planner might affect, in unexpected ways, the trajectory planner. • Furthermore, the trajectory outputted by the trajectory planner might differ in terms of behavior from the one returned by the behavioral planner as they do not share the same objective. • This motion planner comes as both behavioral and trajectory planners share the same objective. • At each planning iteration, depending on the SDV location on the map, a subset of these behaviors, denoted by B(W), is allowed by traffic-rules and hence considered for evaluation. • It then generates low-level realizations of the high-level behaviors by generating a set of trajectories T (b) relative to these paths.
  • 7. Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles • A safe trajectory for the SDV should not only be collision- free, but also satisfy a safety-distance to the surrounding obstacles, including both the static and dynamic objects such as vehicles, pedestrians, cyclists, unknown objects, etc. • It defines costs to capture the S-T overlap and violation of safety-distance respectively. • For this, the SDV polygon is approximated by a set of circles with the same radii along the vehicle, then using the distance from the center of the circles to the object polygon to evaluate the cost. • The SDV is expected to adhere to the structure of the road. Therefore, introduce sub-costs that measure such violations. • The driving-path and boundaries that are considered for these sub-costs depend on the candidate behavior. • The driving-path cost is the squared distance towards the driving path and the lane boundary cost is the squared violation distance of a safety threshold.
  • 8. Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles Left: Headway cost penalizes unsafe distance to leading vehicles. Right: for each sampled trajectory, a weight function determines how relevant an obstacle is to the SDV in terms of its lateral offset.
  • 9. Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles • As the SDV is driving behind a leading vehicle in either lane-following or lane-change behavior, it should keep a safe longitudinal distance that depends on speed of SDV and the leading vehicle. • Compute the headway cost as the violation of the safety distance after applying a comfortable constant deceleration, assuming that the leading vehicle applies a hard brake and deciding which vehicles are leading the SDV at each time-step in the planning horizon. • Use a weight function of the lateral distance between the SDV and other vehicles to determine how relevant they are for the headway cost. • The distance violation costs incurred by vehicles that are laterally aligned with SDV dominate the cost, compatible with lane change maneuvers where deciding the lead vehicles can be difficult. • Pedestrians are vulnerable road users and hence require extra caution, defining a yield cost. • The mission route is represented as a sequence of lanes, from which to specify all lanes that are on the route or are connected to the route by permitted lane-changes. • A cost-to-go function to capture the value of the final state, speed-limit of a lane for a cost that penalizes a trajectory which exceeds the eligible speed and costs for comfortable driving.
  • 10. Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles Behavioral decisions include obstacle side assignment and lane information, which are sent through the behavioral- trajectory interface. Example trajectories in a nudging scenario
  • 11. Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles The max-margin objective uses a surrogate loss to learn the sub-cost weights, since selecting the optimal trajectory within a discrete set is not differentiable. In contrast, the iterative optimization in the trajectory planner is a differentiable module, where gradients of the imitation loss function can be computed using the backpropagation through time (BPTT) algorithm. Since unrolling the full optimization can be computationally expensive, unroll only for a truncated number of steps after we obtain a solution. Perform M gradient descent steps after obtaining the optimal trajectory, and backpropagate through these M steps only. If the control obtained from the continuous optimization converges to the optimum, then backpropagating through a truncated number of steps is approximating of the inverse Hessian at the optimum.
  • 12. Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles •Behavioral with max-margin (“B+M”) learns the weight vector through the max-margin (+M) learning on the behavioral planner only. •Full Inference (“B+M +J”) uses the trained weights of “B+M”, and runs the joint inference algorithm (+J) at test time. •Full Learning & Inference (“B+M +J +I”) learns the weight vector using the combination of max- margin (+M) and imitation objective (+I), and runs the joint inference algorithm (+J) at test time.
  • 13. Traject. Predict. for Auto. Driving based on Multi-Head Attention with Joint Agent-Map Representation • Predicting the trajectories of surrounding agents is an essential ability for robots navigating complex real-world environments. • Autonomous vehicles (AV) in particular, can generate safe and efficient path plans by predicting the motion of surrounding road users. • Future trajectories of agents can be inferred using two tightly linked cues: the locations and past motion of agents, and the static scene structure. • The configuration of the agents may uncover which part of the scene is more relevant, while the scene structure can determine the relative influence of agents on each others motion. • To better model the interdependence of the two cues, a multi- head attention-based model that uses a joint representation of the static scene and agent configuration for generating both keys and values for the attention heads. • To address the multimodality of future agent motion, use each attention head to generate a distinct future trajectory of the agent. • The visualization of attention maps adds a layer of interpretability to the trajectories predicted by the model.
  • 14. Traject. Predict. for Auto. Driving based on Multi-Head Attention with Joint Agent-Map Representation MHA-JAM (MHA with Joint Agent Map representation): Each LSTM encoder generates an encoding vector of one of the surrounding agent recent motion. The CNN backbone transforms the input map image to a 3D tensor of scene features. A combined representation of the context is built by concatenating the surrounding agents motion encodings and the scene features. Each attention head models a possible way of interaction between the target (green car) and the combined context features. Each LSTM decoder receives a context vector and the target vehicle encoding and generates a possible distribution over a possible predicted trajectory conditioned on each context.
  • 15. Traject. Predict. for Auto. Driving based on Multi-Head Attention with Joint Agent-Map Representation Off-road loss: an auxiliary loss function that penalizes locations predicted by the model the fall outside the drivable area. It is proportional to the distance of a predicted location from the nearest point on the drivable area. Regression loss: To not penalize plausible trajectories generated by the model that do not correspond to the ground truth, use a variant of the best-of-L regression loss for training our model. Compute the negative log-likelihood (NLL) of the ground truth trajectory under each of the L modes output by the model and consider the minimum of the L NLL values as the regression loss. Classification loss: In addition to the regression loss, consider cross entropy.
  • 16. Traject. Predict. for Auto. Driving based on Multi-Head Attention with Joint Agent-Map Representation Joint vs. separate agent-map representation for the attention heads. two models: (1) a baseline where attention weights are separately generated for the map and agents features by generating keys and values for each set of features independent of the other (a), (2) this formulation where each attention head generates keys and values based on a joint representation of agent and map features (b).
  • 17. Traject. Predict. for Auto. Driving based on Multi-Head Attention with Joint Agent-Map Representation
  • 18. MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction • Autonomous vehicles are expected to drive in complex scenarios with several independent non cooperating agents. • Path planning for safely navigating in such environments can not just rely on perceiving present location and motion of other agents. • It requires instead to predict such variables in a far enough future: the problem of multimodal trajectory prediction exploiting a Memory Augmented Neural Network. • This method learns past and future trajectory embeddings using RNNs and exploits an associative external memory to store and retrieve such embeddings. • Trajectory prediction is then performed by decoding in-memory future encodings conditioned with the observed past. • It incorporates scene knowledge in the decoding state by learning a CNN on top of semantic scene maps. • Memory growth is limited by learning a writing controller based on the predictive capability of existing embeddings. • Thanks to the non-parametric nature of the memory module, the trained system can continuously improve by ingesting novel patterns.
  • 19. MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction MANTRA addresses multimodal trajectory prediction. Obtain multiple future predictions given an observed past relying on a Memory Augmented Neural Network.
  • 20. MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction Architecture of MANTRA. The encoding of an observed past trajectory is used as key to read likely future encodings from memory. A multimodal prediction is obtained by decoding each future encoding, conditioned by the observed past. The surrounding context is processed by a CNN and fed to the Refinement Module to adjust predictions.
  • 21. MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction Representation learning: The encoders learn to map past and future points into a meaningful representation and the decoder learns to reproduce the future. Instead of using just the future as input, condition the reconstruction process also with an encoding of the past. Past and future trajectories are encoded separately; a decoder reconstructs future trajectory only.
  • 22. MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction
  • 23. CoverNet: Multimodal Behavior Prediction using Trajectory Sets • CoverNet, a new method for multimodal, probabilistic trajectory prediction for urban driving. • Previous work has employed a variety of methods, including multimodal regression, occupancy maps, and 1-step stochastic policies. • It frames the trajectory prediction problem as classification over a diverse set of trajectories. • The size of this set remains manageable due to the limited number of distinct actions that can be taken over a reasonable prediction horizon. • It structures the trajectory set to a) ensure a desired level of coverage of the state space, and b) eliminate physically impossible trajectories. • By dynamically generating trajectory sets based on the agent’s current state, they further improve the method’s efficiency.
  • 24. CoverNet: Multimodal Behavior Prediction using Trajectory Sets CoverNet overview following MTP
  • 25. CoverNet: Multimodal Behavior Prediction using Trajectory Sets
  • 26. Motion Prediction using Trajectory Sets and Self- Driving Domain Knowledge • Predicting the future motion of vehicles has been studied using various techniques, including stochastic policies, generative models, and regression. • Recent work has shown that classification over a trajectory set, which approximates possible motions, achieves state-of-the-art performance and avoids issues like mode collapse. • However, map information and the physical relationships between nearby trajectories is not fully exploited in this formulation. • Build on classification-based approaches to motion prediction by adding an auxiliary loss that penalizes off-road predictions. • This auxiliary loss can easily be pretrained using only map information (e.g., off- road area), which significantly improves performance on small datasets. • Weighted cross-entropy losses to capture spatial-temporal relationships among trajectories.
  • 27. Motion Prediction using Trajectory Sets and Self- Driving Domain Knowledge Visualization of on-road (black) and off-road (red) trajectories Visualization of the target distribution in the standard cross-entropy formulation (left), and the weighted cross-entropy loss (right)
  • 28. Motion Prediction using Trajectory Sets and Self- Driving Domain Knowledge Results listed as Argoverse | nuScenes
  • 29. Learning Situational Driving • Human drivers have a remarkable ability to drive in diverse visual conditions and situations, e.g., from maneuvering in rainy, limited visibility conditions with no lane markings to turning in a busy intersection while yielding to pedestrians. • In contrast, state-of-the-art sensorimotor driving models struggle when encountering diverse settings with varying relationships between observation and action. • To generalize when making decisions across diverse conditions, humans leverage multiple types of situation- specific reasoning and learning strategies. • Motivated by this observation, a framework for learning a situational driving policy that effectively captures reasoning under varying types of scenarios. • The key idea is to learn a mixture model with a set of policies to capture multiple driving modes. • First optimize the mixture model through behavior cloning. • Then refine the model by directly optimizing for the driving task itself, i.e., supervised with the navigation task reward. • It is more scalable than methods assuming access to privileged information, e.g., perception labels, as it only assumes demonstration and reward-based super- vision.
  • 30. Learning Situational Driving Situational Driving. To address the complexity in learning perception-to-action driving models, we introduce a situational framework using a behavior module. The module reasons over current on-road scene context when composing a set of learned behavior policies under varying driving scenarios. Our approach is used to improve over behavior reflex and privileged approaches in terms of robustness and scalability.
  • 31. Learning Situational Driving Approach Overview. The agent learns to combine a set of expert policies in a context- dependent, task- optimized manner to robustly drive in diverse scenarios.
  • 33. AMENet: Attentive Maps Encoder Network for Trajectory Prediction • Trajectory prediction is a crucial task in different communities, such as intelligent transportation systems, computer vision, and mobile robot applications. • However, there are many challenges to predict the trajectories of heterogeneous road agents (e.g., pedestrians, cyclists and vehicles) at a microscopical level. • For example, an agent might be able to choose multiple plausible paths in complex interactions with other agents in varying environments, and the behavior of each agent is affected by the various behaviors of its neighboring agents. • To this end, an end-to-end generative model named Attentive Maps Encoder Network (AMENet) for accurate and realistic multi-path trajectory prediction. • It leverages the target road user’s motion information (i.e., movement in xy-axis in a Cartesian space) and the interaction information with the neighboring road users at each time step, which is encoded as dynamic maps that are centralized on the target road user. • A conditional variational auto-encoder module is trained to learn the latent space of possible future paths based on the dynamic maps and then used to predict multiple plausible future trajectories conditioned on the observed past trajectories.
  • 34. AMENet: Attentive Maps Encoder Network for Trajectory Prediction An overview of the proposed framework. It consists of four modules: the X-Encoder and Y-Encoder are used for encoding the observed and the future trajectories, respectively. They have a similar structure. The Sample Generator produces diverse samples of future generations. The Decoder module is used to decode the features from the produced samples in the last step and predicts the future trajectory sequentially
  • 35. AMENet: Attentive Maps Encoder Network for Trajectory Prediction Structure of the X-Encoder. The encoder has two branches: the upper one is used to extract motion information of target agents, and the lower one is used to learn the interaction information among the neighboring road users from dynamic maps over time. Each dynamic map consists of 3 layers that represents orientation, travel speed and relative position, which are centralized on the target road user respectively. The motion information and the interaction information are encoded by their own LSTM sequentially. The last outputs of the two LSTMs are concatenated and forwarded to a fc layer to get the final output of the X-Encoder. The Y-Encoder has the same structure as the X-Encoder but it is used for extracting features from the future trajectories and only used in the training phase.
  • 36. AMENet: Attentive Maps Encoder Network for Trajectory Prediction
  • 37. MCENET: Multi-Context Encoder Network for Homogeneous Agent Traj. Pred. in Mixed Traffic • Trajectory prediction in urban mixed-traffic zones (a.k.a. shared spaces) is critical for many intelligent transportation systems, such as intent detection for autonomous driving. • However, there are many challenges to predict the trajectories of heterogeneous road agents (pedestrians, cyclists and vehicles) at a microscopical level. • For example, an agent might be able to choose multiple plausible paths in complex interactions with other agents in varying environments. • Multi-Context Encoder Network (MCENET) is trained by encoding both past and future scene context, interaction context and motion information to capture the patterns and variations of the future trajectories using a set of stochastic latent variables. • In inference time, combine the past context and motion info of the target agent with samplings of the latent variables to predict multiple realistic trajectories in the future. • Through experiments on several datasets of varying scenes, it outperforms some of the recent state-of-the-art methods for mixed traffic trajectory prediction by a large margin and more robust in a very challenging environment.
  • 38. MCENET: Multi-Context Encoder Network for Homogeneous Agent Traj. Pred. in Mixed Traffic Predicting the future trajectory (d) by observing the past trajectories (c) considering the scene (a) and grouping context (b). Three kinds of scene context: (1) aerial photograph provides overview of the environment, (2) segmented map defines the accessible areas respective to road agents’ transport mode and (3) the motion heat map describes the prior of how different agents move. Different colors denote different agents or agent groups.
  • 39. MCENET: Multi-Context Encoder Network for Homogeneous Agent Traj. Pred. in Mixed Traffic The pipeline for the method. The ground truth Y and the associated interaction and scene context are injected to the input only in training. They are not available in inference. The latent variables are sampled N times and concatenated with the output of X-Encoder for predicting multiple future paths.
  • 40. MCENET: Multi-Context Encoder Network for Homogeneous Agent Traj. Pred. in Mixed Traffic
  • 41. Multi-Head Attention based Probabilistic Vehicle Trajectory Prediction • Online-capable deep learning model for probabilistic vehicle trajectory prediction. • A simple encoder-decoder architecture based on multi- head attention. • It generates the distribution of the predicted trajectories for multiple vehicles in parallel. • It models the interactions by learning to attend to a few influential vehicles in an unsupervised manner, which can improve the interpretability of the network. • Interpretability: The use of multi-head attention improves the interpretability of the neural network because the model can learn the social relations of neighboring vehicles in an unsupervised manner. • Scalability: As the output dimension of multi-head attention is flexible to the number of the vehicles, the network can be extended to very dense traffic scenarios. The network is tested in an autonomous vehicle platform with surrounding vehicles less than 30. The average computation time is 50ms. • Accuracy: The method is verified by using the naturalistic trajectory data in highway, and the better performance than the existing methods in terms of positional error.
  • 42. Multi-Head Attention based Probabilistic Vehicle Trajectory Prediction The road on the left denotes the input of the prediction model, which consists of the past trajectories of surrounding vehicles, X, and the lane information, I. The road on the right denotes the output of the prediction model, which is the distribution of the future trajectories, P(Y|X,I).
  • 43. Multi-Head Attention based Probabilistic Vehicle Trajectory Prediction Structure of the attention layer for both the lane and the vehicles.
  • 44. Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles • Trajectory prediction is crucial for autonomous vehicles. • The planning system not only needs to know the current state of the surrounding objects but also their possible states in the future. • As for vehicles, their trajectories are significantly influenced by the lane geometry and how to effectively use the lane information is of active interest. • Most of the existing works use rasterized maps to explore road information, which does not distinguish different lanes. • It is an instance-aware representation for lane representation. • By integrating the lane features and trajectory features, a goal-oriented lane attention module is proposed to predict the future locations of the vehicle. • The lane representation together with the lane attention module can be integrated into the widely used encoder-decoder framework to generate diverse predictions. • Most importantly, each generated trajectory is associated with a probability to handle the uncertainty. • It does not suffer from collapsing to one behavior modal and can cover diverse possibilities.
  • 45. Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles An overview of this method. The model consists of a trajectory encoder, a lane encoder, an interaction network, a lane attention module and a final trajectory decoder.
  • 46. Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles (a) An example of selected lanes. The blue dot represents the last location of the target vehicle. “s” and “e” denotes the start and end of a road segment respectively. (b) The architecture of Lane Encoder. “conv 1, 64” means 1D convolution with kernel size of 1 and 64 output channels. The final output is a 128-d vector for each lane. (c) The structure of proposed lane attention module.
  • 47. Probabilistic Multi-modal Trajectory Prediction with Lane Attention for Autonomous Vehicles
  • 48. Traffic Agent Trajectory Prediction Using Social Convolution and Attention Mechanism • The trajectory prediction is significant for the decision-making of autonomous driving vehicles. • This paper proposes a model to predict the trajectories of target agents around an autonomous vehicle. • The main idea is considering the history trajectories of the target agent and the influence of surrounding agents on the target agent. • It encodes the target agent history trajectories as an attention mask and constructs a social map to encode the interactive relationship between the target agent and its surrounding agents. • Given a trajectory sequence, the LSTM networks are firstly utilized to extract the features for all agents, based on which the attention mask and social map are formed. • Then, the attention mask and social map are fused to get the fusion feature map, which is processed by the social convolution to obtain a fusion feature representation. • Finally, this fusion feature is taken as the input of a variable-length LSTM to predict the trajectory of the target agent. • The variable-length LSTM enables the model to handle the case that the number of agents in the sensing scope is highly dynamic in traffic scenes.
  • 49. Traffic Agent Trajectory Prediction Using Social Convolution and Attention Mechanism The target agent is marked by the grey square. The blue grid region around it is its grid cell. It generates input representation for all agents based on trajectory information. These representation are passed through LSTMs and eventually used to construct the social map, the target agent’s representation is encoded as the attention mask. The production of attention mask and social map is passed through ConvNets and then concatenated together with the target agent tensor to produce latent representation. Finally, this latent representation are passed through an LSTM to generate a trajectory prediction for the target agent.
  • 50. Traffic Agent Trajectory Prediction Using Social Convolution and Attention Mechanism The Results For Trajectory Prediction On BLVD Dataset The Results Of Different Combination Models
  • 51. Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse RL • General-purpose trajectory planning algorithms for automated driving utilize complex reward functions to perform a combined optimization of strategic, behavioral, and kinematic features. • The specification and tuning of a single reward function is a tedious task and does not generalize over a large set of traffic situations. • Deep learning approaches based on path integral inverse reinforcement learning have been successfully applied to predict local situation-dependent reward functions using features of a set of sampled driving policies. • Sample-based trajectory planning algorithms are able to approximate a spatio-temporal subspace of feasible driving policies that can be used to encode the context of a situation. • However, the interaction with dynamic objects requires an extended planning horizon, which requires sequential context modeling. • This work cares the sequential reward prediction over an extended time horizon. • A neural network architecture that uses a policy attention mechanism to generate a low- dimensional context vector by concentrating on trajectories with a human-like driving style. • Besides, a temporal attention mechanism to identify context switches and allow for stable adaptation of rewards.
  • 52. Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse RL Illustration of planner for automated driving, which samples policies for our deep inverse RL approach. The z-axis corresponds to the velocity, whereas the ground plane depicts spatial feature maps such as distances from the lane centers. A subset of policies is visualized, where the green triangle shows the optimal policy and the blue triangles high-light the highest policy attention. The color gradient corresponds to the policy value. Blue policies have a high attention activation. The cylindric objects represent a stop barrier.
  • 53. Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse RL Neural network architectures for situation-dependent reward prediction. Policy temporal attention architecture consisting of policy attention and temporal attention mechanism. Inputs are a set of planning cycles each having a set of policies. Policy encoder generates a latent representation of individual policies. Policy attention mechanism produces a low-dimensional context vector, which is forwarded to the temporal attention network (TAN). Policy temporal attention mechanism predicts a mixture reward function given a history of context vectors.
  • 54. Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse RL Overview of average test performance based on expected value difference (EVD), expected distance (ED), and optimal policy distance (OPD). Tests are conducted on a test dataset, recorded by an expert- tuned planning algorithm.
  • 55. Vehicle Trajectory Prediction by Transfer Learning of Semi-Supervised Models • This work shows that semi-supervised models for vehicle trajectory prediction significantly improve performance over supervised models on state-of-the-art real-world benchmarks. • Moving from supervised to semi-supervised models allows scaling-up by using unlabeled data, increasing the number of images in pre-training from Millions to a Billion. • It performs ablation studies comparing transfer learning of semi-supervised and supervised models while keeping all other factors equal. • Within semi-supervised models it compares contrastive learning with teacher-student methods as well as networks predicting a small number of trajectories with networks predicting probabilities over a large trajectory set.
  • 56. Vehicle Trajectory Prediction by Transfer Learning of Semi-Supervised Models An example of input and output representations for mid-level (top) and low-level representations (bottom). In the top row, the mid-level input representation is an annotated map of the scene (top left), with boxes representing agent positions and colors representing semantic categories. The output (top right) is a probability distribution over a set of candidate trajectories. In the bottom row, a low-level representation uses an image from the vehicle’s front-facing camera as input (bottom left), and predicts the future steering wheel angle (bottom right) and speed of the vehicle.
  • 57. Vehicle Trajectory Prediction by Transfer Learning of Semi-Supervised Models • Mid-level representation: an annotated map image to represent the driving environment. This includes annotations for drivable areas, crosswalks and walkways using color coding to represent semantic categories. All scenes are oriented such that the agent under consideration is centered and directed towards the top of the image. The positions of all agents in the scene are drawn onto the image, using faded bounding boxes to represent past positions in a historical window. By encoding all this information into a single map, a large amount of information is condensed into a single image. • Low-level representation: use front-facing camera images from the Drive360 dataset as a low-level representation of a driving environment. In addition to the image, it includes a vector of semantic map data, which includes datapoints such as the distance to the nearest intersection, the speed limit, and the approximate road curvature.
  • 58. Vehicle Trajectory Prediction by Transfer Learning of Semi-Supervised Models low-level representations mid-level representations
  • 59. Vehicle Trajectory Prediction by Transfer Learning of Semi-Supervised Models Comparison of semi-supervised models used in experiments. the labeled dataset in all the models consists of 1.2m images. since SimCLR is trained on augmentations, there is no measure of unlabeled data set size.
  • 60. Vehicle Trajectory Prediction by Transfer Learning of Semi-Supervised Models results of CoverNet and MTP on the NuScenes dataset, comparing different semi-supervised and supervised models to encode the annotated map. For each semi-supervised model, a direct comparison to a supervised model with the same architecture. semi-supervised models significantly outperform their supervised counterparts on most metrics.