Keynote presentation </b>at ICST (AIST workshop) entitled "Testing and Development Challenges for Complex Cyber-Physical Systems: Insights from the COSMOS H2020 Project"
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
Testing and Development Challenges for Complex Cyber-Physical Systems: Insights from the COSMOS H2020 Project
1. “Testing and Development Challenges for
Complex Cyber-Physical Systems:
Insights from the COSMOS H2020 Project”
20 April, 2023 - Ireland Co-located with ICST 2023
Sebastiano Panichella
Zurich University of Applied Sciences
https://spanichella.github.io/
AIST 2023:
3rd International Workshop on Arti
fi
cial
Intelligence in Software Testing
2. Zurich University of
Applied Science
Senior Computer Science Researcher
Since August 2018
PhD
June 2014
October 2014 - August 2018
About me
Academic & Industrial
Collaborations
Investigated Research Topics related to
- “Development Automation, Test automation” & “Human Computer Interaction” for
- Software Systems
- Cyber-physical systems (CPSs)
- and, AI-based systems
- “Software Engineering for AI”
- “AI for Software Engineering”
G
2
3. Outline
3
• DevOps shortcomings for Complex CPSs
• What types of bugs occur in open-source CPSs?
• How to enable cost-effective testing for Self-driving cars?
• How to address the Reality Gap problem when testing UAVs?
• Context: Cyber-physical Systems (CPSs)
The COSMOS Project has
received funding from
the European Union’s
Horizon 2020 Research
and Innovation
Programme under grant
agreement No. 957254.
4. Context
“My main research goal is to conduct industrial research, involving both industrial and
academic collaborations, to sustain the Internet of Things (IoT) vision of future "smart cities”,
with millions of smart systems connected over the internet, and/or controlled by complex
embedded software implemented for the cloud."
4
5. 2) Artificial
Intelligence (AI) 3) DevOps, IoT,
Automated Testing (AT)
1) Cyber-physical Systems
Next
10-15 Years (and beyond)
Context
“My main research goal is to conduct industrial research, involving both industrial and
academic collaborations, to sustain the Internet of Things (IoT) vision of future "smart cities”,
with millions of smart systems connected over the internet, and/or controlled by complex
embedded software implemented for the cloud."
5
6. “Emerging Cyber-physical Systems (CPS) will play a crucial role in the quality of
life of European citizens and the future of the European economy”
Context
• CPS relevant sectors:
• Healthcare
• Automotive
• Water Monitoring
• Railway
• Manufacturing
• Avionics
• etc.
MEDICAL DELIVERY
FOOD DELIVERY
• Avionics
6
9. 9
UAVs
“But do we have, today UAVs, that would autonomously
map the disaster area at the Fukushima nuclear power
plant or spot the location of people stranded and isolated
after such disaster?”
Fukushima disaster
Unmanned Aerial Vehicles (UAVs) - a specific case of “CPSs”
Problem Statement (1)
10. • -
• Our (Software Engineering) view of DevOps and AI for IoT systems:
• DevOps and Continuous Delivery (CD): Whats is it?
• Present, Challenges, and Opportunities
• Relevant Research Questions
• Arti
fi
cial Intelligence (AI) and Testing Automation:
• Present, Challenges, and Opportunities
• User-oriented Testing Automation
• Relevant Research Questions
“We all recognize the relevance and capacity of contemporary cyber-
physical systems for building the future of our society, but ongoing research
in the
fi
eld is also clearly failing in making the right countermeasures to
avoid that CPS usage a
ff
ects human being safety”. In
“Self-driving Uber kills Arizona
woman in first fatal crash involving
pedestrian”
“Swiss Post drone
crashes in Zurich
Problem Statement (2)
“A simple software update was
the direct cause of the fatal
crashes of the Boeing 737”
10
12. • -
• Our (Software Engineering) view of DevOps and AI for IoT systems:
• DevOps and Continuous Delivery (CD): Whats is it?
• Present, Challenges, and Opportunities
• Relevant Research Questions
• Arti
fi
cial Intelligence (AI) and Testing Automation:
• Present, Challenges, and Opportunities
• User-oriented Testing Automation
• Relevant Research Questions
“Self-driving Uber kills Arizona
woman in first fatal crash involving
pedestrian”
Challenges
“A simple software update was
the direct cause of the fatal
crashes of the Boeing 737”
Challenge 1: Observability, testability, and predictability of the behavior
of emerging CPS is highly limited and, unfortunately, their usage in the real
world can lead to fatal crashes sometimes tragically involving also humans
12
13. Research Challenges and Opportunities
As reported by National Academies:
[“A 21st Century Cyber-Physical Systems Education”]
“today's practice of IoT system design and
implementation are often unable to support
the level of ``complexity, scalability, security,
safety, […] required to meet future needs”
13
14. Research Challenges and Opportunities
“The main problem is that contemporary
development methodologies for CPS need to
incorporate core aspects of both systems and
software engineering communities, with the
goal to explicitly embrace and consider the
several direct and indirect physical effects of
software”
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
14
Crash of
Boeing 737
As reported by National Academies:
[“A 21st Century Cyber-Physical Systems Education”]
“today's practice of IoT system design and
implementation are often unable to support
the level of ``complexity, scalability, security,
safety, […] required to meet future needs”
15. Research Challenges and Opportunities
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
“As identi
fi
ed by agile methodologies, the
development of modern/emerging systems
(e.g., e-health, automotive, satellite, and IoT
manufacturing systems) should evolve with
the systems, ``as development never ends”
15
Tools
“The main problem is that contemporary
development methodologies for CPS need to
incorporate core aspects of both systems and
software engineering communities, with the
goal to explicitly embrace and consider the
several direct and indirect physical effects of
software”
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
Crash of
Boeing 737
As reported by National Academies:
[“A 21st Century Cyber-Physical Systems Education”]
“today's practice of IoT system design and
implementation are often unable to support
the level of ``complexity, scalability, security,
safety, […] required to meet future needs”
16. Research Challenges and Opportunities
These concepts are closely related to DevOps and
Arti
fi
cial Intelligence technologies, and several
researchers and practitioners advocate them as a
promising solutions for the development,
maintenance, testing, and evolution of these
complex systems
16
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
“As identi
fi
ed by agile methodologies, the
development of modern/emerging systems
(e.g., e-health, automotive, satellite, and IoT
manufacturing systems) should evolve with
the systems, ``as development never ends”
Tools
“The main problem is that contemporary
development methodologies for CPS need to
incorporate core aspects of both systems and
software engineering communities, with the
goal to explicitly embrace and consider the
several direct and indirect physical effects of
software”
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
Crash of
Boeing 737
As reported by National Academies:
[“A 21st Century Cyber-Physical Systems Education”]
“today's practice of IoT system design and
implementation are often unable to support
the level of ``complexity, scalability, security,
safety, […] required to meet future needs”
17. Research Challenges and Opportunities
Challenge 1: Observability, testability, and
predictability of the behavior of emerging
CPS is highly limited and, unfortunately,
their usage in the real world can lead to fatal
crashes sometimes tragically involving also
humans
Challenge 2: Contemporary DevOps and
AI practices and tools are potentially the
right solution to this problem, but they are
not developed to be applied in CPS
domains
These concepts are closely related to DevOps and
Arti
fi
cial Intelligence technologies, and several
researchers and practitioners advocate them as a
promising solutions for the development,
maintenance, testing, and evolution of these
complex systems
19. Sebastiano Panichella Sajad Khatiri
Christian Birchler
COSMOS:
DevOps for Complex Cyber-physical Systems
https://www.cosmos-devops.org/ https://twitter.com/COSMOS_DEVOPS https://lnkd.in/eUVeaYaz
20. COSMOS Vision
■ Develop novel DevOps tools,
methodologies, and techniques that enable
effective, continuous development and
evolution of CPS
■ Increase the level of reliability,
dependability, trustworthiness, and
adaptability of CPS
■ Delivers proven DevOps advantages and
benefits to Europe’s CPS development
community
https://www.cosmos-devops.org/ https://twitter.com/COSMOS_DEVOPS https://lnkd.in/eUVeaYaz
27. 27
Innovation Area 1: DevOps Pipelines for CPS
WP3: Methodology for Setting-Up and Maintaining
COSMOS DevOps Pipelines
■ CI/CD Antipatterns Identification for CPS
■ Definition of a DevOps-based Methodology to Support
the Development of Self-Adaptive CPS
■ COSMOS Pipeline Optimization
COMPONENTS
28. 28
Innovation Area 2: V&V and Security Assessment of DevOps pipelines
COMPONENTS
WP4: V&V and security assessment of COSMOS DevOps
pipelines
■ Development of Automated Techniques for Software Testing
for CPS
■ Development of Run-time Verification Techniques for
Checking and Diagnosing CPS Executions
■ Development of Solutions for Detecting Security
Vulnerabilities in CPS
29. 29
Innovation Area 3: Tools for High Quality CPS Software Evolution
WP5: Development of Tools to support High Quality CPS
Software Evolution
■ Design and Development of Refactoring Framework
for Secure and Reliable CPS
■ Development of Test Case Generation Tools for Rapid
DevOps Iterations
■ Development of Tools to support User-oriented
Maintenance and Testing
COMPONENTS
30. Innovation Area 4: Tools for Monitoring, Self-healing and Self-adaptability of CPS
WP6: Development of Tools to support Monitoring, Self-
healing, and Self-adaptability of CPS in the Field
■ Development and Assessment of CPS Change &
Behavioral Models
■ Developing AI-based Solutions to Support Two-speed
DevOps Cycles for CPS
■ Automated Quality Assessment and Monitoring of CPS
in the Field
■ Development of AI-based Solutions to Increase CPS
Self-adaptability to Diverse Contexts
COMPONENTS
https://www.cosmos-devops.org/ https://twitter.com/COSMOS_DEVOPS https://lnkd.in/eUVeaYaz
31. Innovation Area 4: Tools for Monitoring, Self-healing and Self-adaptability of CPS
WP6: Development of Tools to support Monitoring, Self-
healing, and Self-adaptability of CPS in the Field
■ Development and Assessment of CPS Change &
Behavioral Models
■ Developing AI-based Solutions to Support Two-speed
DevOps Cycles for CPS
■ Automated Quality Assessment and Monitoring of CPS
in the Field
■ Development of AI-based Solutions to Increase CPS
Self-adaptability to Diverse Contexts
COMPONENTS
https://www.cosmos-devops.org/ https://twitter.com/COSMOS_DEVOPS https://lnkd.in/eUVeaYaz
32. Outline
32
• DevOps shortcomings for Complex CPSs
• What types of bugs occur in open-source CPSs?
• How to enable cost-effective testing for Self-driving cars?
• How to address the Reality Gap problem when testing UAVs?
• Context: Cyber-physical Systems (CPSs)
The COSMOS Project has
received funding from
the European Union’s
Horizon 2020 Research
and Innovation
Programme under grant
agreement No. 957254.
35. 35
Bugs and Failures in CPS
CPS Bug is
“a
fl
aw in the hardware (not properly
handled by the software), or an incorrect
interaction between the software and
hardware components leading to a CPS
misbehavior’’
A CPS bug can manifest as a CPS failure,
which makes a CPS unable to deliver its
required functionality or not ful
fi
lling some
non-functional properties
Properties
36. 36
Bugs in the PX4 Project
https://github.com/PX4/PX4-Autopilot/issues/8980
Px4 Issue 8980: Unsuccessful
fl
ight
“ Autopilot receiving noisy sensor-data…“
37. 37
Bugs in the OpenPilot Project
Openpilot Issue 2103: A CAN bus error
https://github.com/commaai/openpilot/issues/2103
“ Software update on unsupported hardware devices…“
39. 39
Fiorella Zampetti, Ritu Kapur, Massimiliano Di Penta,
Sebastiano Panichella: An Empirical Characterization
of Software Bugs in Open-Source Cyber-Physical
Systems. Journal of Systems & Software (JSS).
What types of bugs occur in Open-source CPSs?
CPS
Bugs
1,151 closed issues sampled from
14 open-source CPS projects
CPS bugs taxonomy comprises
8 di
ff
erent high-level categories
40. 40
What types of bugs occur in Open-source CPSs?
Process for designing a taxonomy of bugs occurring in CPSs
from GitHub (Arduino, drones, robotics, automotive, etc.)
41. 41
What types of bugs occur in Open-source CPSs?
Process for designing a taxonomy of bugs occurring in CPSs
from GitHub (Arduino, drones, robotics, automotive, etc.)
42. 42
Analyzed Projects
Fiorella Zampetti, Ritu Kapur, Massimiliano Di Penta, Sebastiano Panichella: An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical Systems. Journal of Systems & Software (JSS).
14 open-source CPS projects CPS Domains Issues
Closed
Issues
Bug-related
Issues
43. 43
What types of bugs occur in Open-source CPSs?
33% of the
bugs are
CPS-speci
fi
c
7
6
5
4
3
2
1
8
Grouped into
8 high-level categories
CPS Bug taxonomy:
22 different root causes
44. 44
Hardware Bugs in Open-source CPSs
Energy
Faulty Sensors
Hardware failure
Hardware
Not Supported/Compatible
1
45. 45
Hardware Bugs in Open-source CPSs
Energy
Faulty Sensors
Hardware failure
1
Bug #21033 in openpilot points out the
presence of a CAN bus error on a
speci
fi
c device (i.e., Rav4 Prime).
Main Findings:
Hardware-speci
fi
c bugs are peculiar to our taxonomy, and, unsurprisingly, all of
them are CPS-speci
fi
c.
Recognizing (and simulating) hardware failures has paramount importance in V&V.
Also, developers should take particular care of hardware compatibility, especially
for CPSs targeting multiple devices.
The interaction with the hardware makes particularly crucial the analysis of non-
functional properties such as performance, memory, and energy consumption.
47. 47
Network & Interface Bugs in Open-source CPSs
Bug #4302 in Arduino, where there is a memory leak while
doing repeated connections to a server, causing the loss of around
8KB for each connection.
Bug #6546 in PX4-AutoPilot that has been inherited from the third-
party library being used while interfacing with GPS (dealing with GPS
‘‘jamming’’ that has already been reported as an issue in the library aimed at supporting
the Intel Aero Platform)
48. 48
Network & Interface Bugs in Open-source CPSs
Main Findings:
Networking plays a paramount role for CPSs
and can be the origin of bugs.
The CPS infrastructure should include
network monitors and V&V techniques may
contemplate CPS misbehavior caused by
network-speci
fi
c aspects.
Main Findings:
Interfacing bugs are challenging for
developers coping with CPSs, and testing
e
ff
orts should focus on them.
49. 49
Algorithmic Bugs in Open-source CPSs
Bug#2620) ArduPilot where the barometer sensor
in a speci
fi
c condition is not handled by the application:
“`the barometer altitude became NaN [...] but the EKF
probably continued to use the barometer altitude…”
Barometer
Bug #801 in ArduPilot related to the setting of the vertical acceleration:
‘‘Vehicle was not reaching target climb or descent rate because of
incorrectly defaulted acceleration’’.
Acceleration
50. 50
Algorithmic Bugs in Open-source CPSs
Bug#2620) ArduPilot where the barometer sensor
in a speci
fi
c condition is not handled by the application:
“`the barometer altitude became NaN [...] but the EKF
probably continued to use the barometer altitude…”
Barometer
Bug #801 in ArduPilot related to the setting of the vertical acceleration:
‘‘Vehicle was not reaching target climb or descent rate because of
incorrectly defaulted acceleration’’.
Acceleration
Main Findings:
Algorithmic bugs in CPSs tend to be similar to those occurring in other types of
software systems.
Therefore, existing mutants taxonomies can be used to seed some representative
faults.
However, the way failures manifest (e.g.,
fl
aky e
ff
ects on the hardware or
actuators) can make these bugs more subtle to detect...
52. 52
CPS Safety Related Issues of UAVs
Andrea Di Sorbo, Fiorella Zampetti, Corrado A. Visaggio, Massimiliano Di Penta, and Sebastiano
Panichella: Automated Identification and Qualitative Characterization of Safety Concerns Reported in
UAV Software Platforms. Transactions on Software Engineering and Methodology.
What are the main Hazards and Accidents Emerging from Safety Issues
Reported in UAV Software Platforms?
53. 53
RQ1: To what extent can machine learning models
automatically identify safety-related concerns in
issue reports of UAV software platforms?
RQ2: What are the main hazards and accidents emerging
from safety issues reported in UAV software platforms?
CPS Safety Related Issues of UAVs
RQ1: To what extent can machine learning models
automatically identify safety-related concerns in
issue reports of UAV software platforms?
RQ2: What are the main hazards and accidents emerging
from safety issues reported in UAV software platforms?
What are the main Hazards and Accidents Emerging from Safety Issues
Reported in UAV Software Platforms?
54. 54
Co-occurrences
of hazard
categories
and accident
categories
Hazard Accident
Hazard categories and
corresponding occurrences in our
dataset of 273 safety-related
issues and pull requests.
What are the main Hazards and Accidents Emerging from Safety Issues
Reported in UAV Software Platforms?
55. 55
DevOps Challenges for Dealing with CPS Bugs and Complexity
Interview-based methodology
Interviews’
transcripts
Card Sorting
Early feedback from
COSMOS partners
Bad (and good)
practices,
Challenges,
Barriers,
Mitigation
Analysis Triangulation
Validation outside COSMOS
(survey questionnaire)
Pull Requests (PRs) Mining
20 CPS related projects
56. 56
DevOps Challenges for Dealing with CPS Bugs and Complexity
Zampetti, Fiorella; Tamburri, Damian ; Panichella, Sebastiano;
Panichella, Annibale; Canfora, Gerardo; Di Penta, Massimiliano:
Continuous Integration and Delivery practices for Cyber-Physical
systems: An interview-based study. Transactions on Software
Engineering and Methodology.
Finding Overview:
57. 57
DevOps Challenges for Dealing with CPS Bugs and Complexity
Finding Overview:
Zampetti, Fiorella; Tamburri, Damian ; Panichella, Sebastiano;
Panichella, Annibale; Canfora, Gerardo; Di Penta, Massimiliano:
Continuous Integration and Delivery practices for Cyber-Physical
systems: An interview-based study. Transactions on Software
Engineering and Methodology.
58. 58
DevOps Challenges for Dealing with CPS Bugs and Complexity
Finding Overview:
Zampetti, Fiorella; Tamburri, Damian ; Panichella, Sebastiano;
Panichella, Annibale; Canfora, Gerardo; Di Penta, Massimiliano:
Continuous Integration and Delivery practices for Cyber-Physical
systems: An interview-based study. Transactions on Software
Engineering and Methodology.
61. 61
• Context: Cyber-physical Systems (CPSs)
The COSMOS Project has
received funding from
the European Union’s
Horizon 2020 Research
and Innovation
Programme under grant
agreement No. 957254.
• DevOps shortcomings for Complex CPSs
• What types of bugs occur in open-source CPSs?
• How to enable cost-effective testing for Self-driving cars?
• How to address the Reality Gap problem when testing UAVs?
Outline
62. How to enable cost-effective testing
for Self-driving cars?
62
63. Tesla Car
Autonomous Driving Systems (ADSs)
Multi-sensing Systems:
• Autonomous systems capture surrounding
environmental data at run-time via
multiple sensors (e.g. camera, radar, lidar)
as inputs
• Processes these data with Deep Neural
Networks (DNNs) and outputs control
decisions (e.g. steering).
• Requires robust testing that
• creates realistic, diverse test cases
63
64. Traf
fi
c Sign Recognition (TSR)
Pedestrian Protection (PP) Lane Departure Warning (LDW)
Automated Emergency Braking (AEB)
Environmental Data Collection With ADSs Sensors
64
67. Testing Steps in ADSs
67
Requirements of Testing ADSs
• Generate Diversi
fi
ed Test
Inputs (or Scenarios)
• Evaluation based Failures
Detection
“Manual Testing is still
Dominant…”
70. 70
The New York Times, April 2021
npr, January 2022 Reuters, September 2021
Testing Autonomous Driving Systems
71. 71
class Triangle {
int a, b, c; //sides
String type = "NOT_TRIANGLE";
Triangle (int a, int b, int c){…}
void computeTriangleType() {
1. if (a == b) {
2. if (b == c)
3. type = "EQUILATERAL";
else
4. type = "ISOSCELES";
} else {
5. if (a == c) {
6. type = "ISOSCELES";
} else {
7. if (b == c)
8. type = “ISOSCELES”;
else
9. type = “SCALENE”;
}
}
}
Java Class Under Test (CUT)
@Test
public void test(){
Triangle t = new Triangle (1,2,3);
t.computeTriangleType();
String type = t.getType();
assertTrue(type.equals(“SCALENE”));
}
Test Case
Traditional Development Pipeline:
Coding v.s. Testing
72. 72
class Triangle {
int a, b, c; //sides
String type = "NOT_TRIANGLE";
Triangle (int a, int b, int c){…}
void computeTriangleType() {
1. if (a == b) {
2. if (b == c)
3. type = "EQUILATERAL";
else
4. type = "ISOSCELES";
} else {
5. if (a == c) {
6. type = "ISOSCELES";
} else {
7. if (b == c)
8. type = “ISOSCELES”;
else
9. type = “SCALENE”;
}
}
}
Java Class Under Test (CUT)
@Test
public void test(){
Triangle t = new Triangle (1,2,3);
t.computeTriangleType();
String type = t.getType();
assertTrue(type.equals(“SCALENE”));
}
Test Case
Code Coverage:
The main
Quality Assessment
Criteria
Traditional Development Pipeline:
Coding v.s. Testing
73. 73
class Triangle {
int a, b, c; //sides
String type = "NOT_TRIANGLE";
Triangle (int a, int b, int c){…}
void computeTriangleType() {
1. if (a == b) {
2. if (b == c)
3. type = "EQUILATERAL";
else
4. type = "ISOSCELES";
} else {
5. if (a == c) {
6. type = "ISOSCELES";
} else {
7. if (b == c)
8. type = “ISOSCELES”;
else
9. type = “SCALENE”;
}
}
}
Java Class Under Test (CUT)
@Test
public void test(){
Triangle t = new Triangle (1,2,3);
t.computeTriangleType();
String type = t.getType();
assertTrue(type.equals(“SCALENE”));
}
Test Case
Traditional Development Pipeline:
Coding v.s. Testing
Code Coverage:
Not Suf
fi
cient as
Quality Assessment
Criteria
74. Challenges of Testing ADSs
74
Challenge 1:
Code coverage
vs.
Scenario Coverage
Challenge 2:
Code coverage
&
CPU & Memory
consumption
Challenge 3:
Unit-Test
v.s.
System-level Testing
90. Regression Testing
“Regression testing is re-running
functional and non-functional tests to
ensure that previously developed and
tested software still performs after a
change”
Anirban Basu
90
92. A Typical Scenario
92
Class 1 Class 2 Class 3
Class 4 Class 5 Class 6
Let’s add a new
functionality
Class N
Production Code
Test Code
93. A Typical Scenario
93
Class 1 Class 2 Class 3
Class 4 Class 5 Class 6
Now, let’s test
the new code
Class N+1 Class N+2
Class N
Great!
Production Code
Test Code
94. A Typical Scenario
94
Class 1 Class 2 Class 3
Class 4 Class 5 Class 6
Class N
Production Code
Test Code
Let’s push our
changes to the
CI server
Class N+1 Class N+2
Great!
95. A Typical Scenario
95
Class 1 Class 2 Class 3
Class 4 Class 5 Class 6
Class N
Why? We haven’t
changed Class 6
Class N+1 Class N+2
Production Code
Test Code
96. Why Regression Testing?
Many developers don’t want to believe it, small changes to one part of a
system often cause problems in distant (other) parts of the system
96
97. Why Regression Testing?
Change can have an unexpected impact on other part of the systems
97
Projects can be
large (e.g., OS)
Interconnected
Components
Software always
Evolves
98. Strategies
(Strategy 1) Retest all: this is the most straightforward approach and consists in
simply executing all the existing test cases in the test suite
However, as software evolves, the test suite tends to grow, which means it may
be prohibitively expensive to execute the entire test suite.
98
111. Test Selection for Self-driving Cars
111
How does a test
look like?
{
‘road_points’: [
(x0,y0),
(x1,y1),
(x2,y2),
(x3,y3),
…
]
}
test.json
112. Test Inputs
Environment
Position
and speed
Road Shape
Traf
fi
c lights
position and
status
112
Weather
Leading Car:
- Initial Position
- Initial Speed
Car UnderTest:
- Initial Position
- Initial Speed
119. Dataset Summary
119
More than 10,000 evaluated test scenarios in simulation!
Distribution of safe and unsafe scenarios
0%
25%
50%
75%
100%
BeamNG AI cautious BeamNG AI moderate BeamNG AI reckless Driver.AI
Safe Unsafe
123. Test case prioritization: executing the available test cases in a speci
fi
c order
that increases the likelihood of revealing regression faults earlier.
The third strategy does not involve selection of test cases, and it assumes
that all the test cases may be executed in the speci
fi
ed order, but that testing
may be terminated at some arbitrary point during the testing process.
123
Test Suite
Prioritization
Procedure
T1 T2 T3 T4
Ordered Test Suite
T4 T2 T1 T3
Test Case Prioritization
124. Black-Box Test Case Prioritization
124
@Test
public void testTriangle_invalid2() {
assertEquals(Triangle2.Type.INVALID,
Triangle2.triangle(-5,1,3));
}
@Test
public void testTriangle_isoscele() {
assertEquals(Triangle2.Type.ISOSCELE,
Triangle2.triangle(3,3,4));
}
@Test
public void testTriangle_isoscele() {
assertEquals(Triangle2.Type.ISOSCELE,
Triangle2.triangle(3,4,3));
}
@Test
public void testTriangle_isoscele() {
assertEquals(Triangle2.Type.SCALENE,
Triangle2.triangle(4,9,6));
}
Input diversity:
T1
T2
T3
T4
T1 T2 T3 T4
T1 - 5.91 6.40 12.40
T2 5.91 - 1.41 8.54
T3 6.40 1.41 - 8.30
T4 12.40 8.54 8.30 -
Euclidean distance between
input vectors of T4 and T2
129. Measuring the Road Similarity
129
Road Features
We extract these road features from
simulation-based tests
Roads with more diverse set of
features correspond to more diverse
test case inputs
Diversity is measure using the
Euclidean distance between two
feature vectors
130. Multi-Objective Genetic Algorithms
We can use Genetic Algorithms to search for optimal
test case permutation/prioritization of the test case
130
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T4 T2 T8 … Tn
We can encode a solution as a permutation of n test cases
In this example, T4 is executed before T2 and after T1
131. Multi-Objective Genetic Algorithms
We need search objectives to measure the quality of
each solution. We have two objectives:
131
Initial Population
Selection
Crossover
Mutation
End?
YES NO
Objective 1: the sum of the
distances between each pair
of consecutive test cases ti
and ti-1 in the permutation
Objective 2: the sum of the
(past) execution cost for all
test cases. The cost of the test
ti is divided by its position i in
the permutation
132. Multi-Objective Genetic Algorithms
New solutions (o
ff
spring) are generated by recombining
the genes of two parent solutions
132
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T4 T2 T5 T6 T3
T3 T1 T2 T6 T4 T5
Parent solutions
133. Multi-Objective Genetic Algorithms
New solutions (o
ff
spring) are generated by recombining
the genes of two parent solutions
133
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T4 T2 T5 T6 T3
T3 T1 T2 T6 T4 T5
Parent solutions
T1 T4 T2
O
ff
spring
134. Multi-Objective Genetic Algorithms
New solutions (o
ff
spring) are generated by recombining
the genes of two parent solutions
134
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T4 T2 T5 T6 T3
T3 T1 T2 T6 T4 T5
T1 T4 T2 T3 T6 T5
Parent solutions O
ff
spring
135. Multi-Objective Genetic Algorithms
New solutions (o
ff
spring) are generated by recombining
the genes of two parent solutions
135
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T4 T2 T5 T6 T3
T3 T1 T2 T6 T4 T5
Parent solutions
T1 T4 T2 T3 T6 T5
O
ff
spring 1
T3 T1 T2 T4 T5 T6
Offspring 2
136. Multi-Objective Genetic Algorithms
The newly generated permutation will be further
changes by applying random small mutation
136
O
ff
spring
T1 T4 T2 T3 T6 T5
Mutated O
ff
spring
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T6 T2 T3 T4 T5
The swap mutation operator randomly swaps two genes in a
(offspring) solution
137. (Baseline) Single-objective and Multi-Objective Test Prioritization
137
Multi-Objective Test Prioritization
Single-objective
138. Our Results
138
We compare the
fi
nal solution produced by GA
against two baselines:
• Random prioritization (average of 100K
randomly generated permutations)
• Greedy algorithm, which build a permutation
by greedily selects the test cases that is the
more diverse from the previously selected
one
Safe Driving Style
141. How to go from the Simulated world
to the Physical World when Testing Self-driving cars?
141
Simulated World Physical World
Simulation-based Testing Testing on-the-road
!
142. Application to AICAS (Automotive) Use Case in COSMOS
142
Birchler et al.,
EMSE 2023.
Simulated World Physical World
145. CAN Bus
145
Control Area Network
Protocol used for communication between control
units over a shared network
Used in
- Cars
- Robots
- Automated Production Facilities
- Etc
ECU
BCM
Instr.
Cluster
ABS
Airbag
HVAC
146. CAN Bus
146
Control Area Network
One shared network between all controllers.
Only 3 cables necessary
All stations can publish and receive messages
ECU
BCM
Instr.
Cluster
ABS
Airbag
HVAC
CAN BUS
148. CAN Bus – How it works
148
Data is transmitted trough voltage changes from
recessive to dominant state
CAN Low and High transmit same signal but with
opposite voltage difference:
- Positive difference for CAN High
- Negative difference for CAN Low
Signal can be decoded against GND and against
each other.
149. CAN Bus – Message
149
Message Frame contains
- ID: Denoting the frame
- DATA: Data that should be transmitted
- Error Checking and some other protocol fields
All stations receive all messages and decide on the
relevance
150. CAN Database
fi
le DBC
150
File containing templates for all messages
Each frame consist of:
- ID
- Name
- Multiple Signal Values
Mostly proprietary with some shared standards
151. Can Bug Targeted Goal
151
Many control units communicating with each other
using CAN Bus
Testing the ability to process messages
Currently: Messages are generated manually
- Takes a lot of time
- Results depend on input
- Gets more complicated over time
ECU
BCM
Instr.
Cluster
ABS
Airbag
HVAC
152. Can Bug Targeted Goal
152
Can Bug Targeted Goal
DATA
Initiates
Collects
Generates
CAN Messages
Use the data created during
simulations and create CAN
Messages from that data
+ Data is generated anyways
+ Data should be realistic
+ More messages
153. Proposed Solution
153
Questions:
- What data do we get from the simulator?
- How do we get the data from the simulator?
- How to support multiple simulators and DBC
files?
- How to generate CAN Messages?
Simulator
CAN Bus Handler
[Python]
Output
needs
produces
154. Selected Simulator
154
Soft body simulator
Mainly used in academia
Restricted Simulation of
surroundings
Simulates all critical
components of the car:
- Engine
- Transmission
- Brakes
- Etc.
156. 156
CAN – BUS Handler
Test-Runner
What data do we get from the simulator?
157. 157
Can Bus Handler – Message Generation
Message Generation Steps:
(1) Test Runner starts a simulation and initializes
the CAN Bus Translator
(2) Periodically send the current simulation data to
the Translator
(3) For all frames in the DBC:
(1) Translate the values using the DBC Map
(2) Create the CAN Message using the template
(3) Send the Message to the Output
159. 159
Can Bus Handler – Result
Achievements:
• Working proof of concept
• Ability to generate “real time” CAN
messages
Next Steps:
• Validate the fault detection of
generated messages
• Support for other Simulators
160. 160
• Context: Cyber-physical Systems (CPSs)
The COSMOS Project has
received funding from
the European Union’s
Horizon 2020 Research
and Innovation
Programme under grant
agreement No. 957254.
• DevOps shortcomings for Complex CPSs
• What types of bugs occur in open-source CPSs?
• How to enable cost-effective testing for Self-driving cars?
• How to address the Reality Gap problem when testing UAVs?
Outline
162. Field Testing
o Reliable
o Not Reproducible
o Limited Test Scenarios
o Expensive
o Time Consuming
o Unsafe
162
163. Simulation-based Testing
o Reproducible
o Scalable & Automatable
o Affordable
o Safe
o Reliable Test Results?
o Reproducing Real-world Bugs?
o Generating Realistic Test Cases?
163
164. 164
UAV Test Case Generation in the Neighborhood of Real Flights
Sajad Khatiri, Sebastiano Panichella, Paolo Tonella: Simulation-based Test Case Generation for Unmanned Aerial Vehicles in
the Neighborhood of Real Flights. International Conference on Software Testing, Verification and Validation. (ICST 2023)
▪Unsafe / Misbehavior
165. SURREALIST
Generate Realistic and Effective simulated test cases
1. Systematically Replicate Field Tests in simulation
2. Systematically Generate Challenging, but similar test cases
165
167. 1. Flight Replication
Systematically Replicate field tests in simulation
• Find optimal drone/environment configurations
• Replicate a Specific Logged Behavior
• Minimizing differences in real vs simulation records
167
UAV Config.
Env. Config.
Commands
Expectation
Replicated Test
168. Given the flight log of an autonomous real-world flight,
Can we place an obstacle in the simulation environment
to faithfully replicate the flight’s trajectory?
Evaluation Scenario
1. Flight Replication
168
172. Adapt the mutation
for next round
Mutations
Obstacle Move (∆y = +4)
172
1. Flight Replication
173. Final Solution
173
1. Flight Replication
o Obstacle was moved 2m up, rotated 30˚and made 2.8m taller
o Almost identical replication of the flight trajectory
o Less than 75cm max distance
Main Findings
174. 2. Test Generation
Systematically Generate Challenging but similar test cases
• Manipulate drone/environment configurations
• According to a predefined Difficulty measure
174
UAV Config.
Env. Config.
Commands
Expectation
Generated Test
175. Given a simulated test case for autonomous UAV flight,
Can we generate test cases that violate safety distance
to obstacles by placing an additional obstacle?
175
2. Test Generation
Evaluation Scenario
176. Same size box
Seed Solution
d1
d2
Distance = 2×d1 + d2
176
2. Test Generation
Move (∆x, ∆y)
178. Final Solution
178
2. Test Generation
o Obstacle was moved 8m to the left and 1m up
o Final test revealed a nondeterministic behavior
o A critical and reproducible bug in PX4 which leads to crashes
(reported to PX4 developers)
Main Findings
179. Evaluation Summary
• The information available in the flight logs allows searching
for optimal test properties that faithfully replicate UAV
flight trajectories in simulation.
RQ1 [Flight Replication]
• Modifying a simulation-based test case allows generating
challenging test cases that can expose the UAV to unsafe
behaviors or even crashes.
RQ2 [Test Generation]
179
182. Ongoing Research and Application to GMV (Avionic) Use Case in COSMOS
182
“Comparing Simulation and Field tests
using execution logs”
Testing Path Planning Systems
of airplanes considering
Simulated
World
Physical
World
V.S.
183. Summary
183
• Context: Cyber-physical Systems (CPSs) The COSMOS Project has
received funding from
the European Union’s
Horizon 2020 Research
and Innovation
Programme under grant
agreement No. 957254.
CPS
Bugs
• DevOps shortcomings for Complex CPSs
• What types of bugs occur in open-source CPSs?
• How to enable cost-effective testing for Self-driving cars?
• How to address the Reality Gap problem when testing UAVs?
184. WP7: Integration and Validation
WP7: Integration and Validation of COSMOS Solutions on
Industrial Use Cases
■ Task T7.1: Platform Integration and Customization for Use
Cases
■ Task T7.2: Design of Evaluation Methodology
■ Tasks T7.3 - 7: COSMOS Evaluation on 5 Use cases
CPS Use Cases
https://www.cosmos-devops.org/ https://twitter.com/COSMOS_DEVOPS https://lnkd.in/eUVeaYaz
185. Targeted Impacts (1)
■ Industrial Impacts
⧫ Decreasing percentage of changes that result in CPS failure
⧫ Reducing CPS test execution time and computational resource consumption
⧫ Replacing manually generated tests with automated CPS test coverage
⧫ Improving test effectiveness through tests able to discover more bugs
⧫ Reducing
⧫ number of security vulnerabilities in CPS
⧫ component integration and deployment time
⧫ time to implement a change and make updated CPS operational
⧫ downtime when deploying new CPS hardware or software
https://www.cosmos-devops.org/ https://twitter.com/COSMOS_DEVOPS https://lnkd.in/eUVeaYaz
186. Targeted Impacts (2)
■ CPS DevOps Ecosystem
⧫ Project technologies available in open source with actions to build a European
community and ecosystem exploiting DevOps for CPS
■ Standardisation
⧫ Usage of existing industry standards and proposed new standards and
extensions to ensure “plug-n-play” of DevOps tools for CPS development
■ Academic impact
⧫ Partners produced publications and are contributing to educational content
The COSMOS Project has
received funding from
the European Union’s
Horizon 2020 Research
and Innovation
Programme under grant
agreement No. 957254.
https://www.cosmos-devops.org/ https://twitter.com/COSMOS_DEVOPS https://lnkd.in/eUVeaYaz
187. Thanks for the Attention!
• Any Questions?
“Testing and Development Challenges for
Complex Cyber-Physical Systems:
Insights from the COSMOS H2020 Project”
20 April, 2023 - Ireland Co-located with ICST 2023
Sebastiano Panichella
Zurich University of Applied Sciences
https://spanichella.github.io/