Testing and Development Challenges for Complex Cyber-Physical Systems: Insights from the COSMOS H2020 Project

“Testing and Development Challenges for
Complex Cyber-Physical Systems:
Insights from the COSMOS H2020 Project”
20 April, 2023 - Ireland Co-located with ICST 2023
Sebastiano Panichella
Zurich University of Applied Sciences
https://spanichella.github.io/
AIST 2023:
3rd International Workshop on Arti
fi
cial
Intelligence in Software Testing

Zurich University of
Applied Science
Senior Computer Science Researcher
Since August 2018
PhD
June 2014
October 2014 - August 2018
About me
Academic & Industrial
Collaborations
Investigated Research Topics related to
- “Development Automation, Test automation” & “Human Computer Interaction” for
- Software Systems
- Cyber-physical systems (CPSs)
- and, AI-based systems
- “Software Engineering for AI”
- “AI for Software Engineering”
G
2

Outline
3
• DevOps shortcomings for Complex CPSs
• What types of bugs occur in open-source CPSs?
• How to enable cost-effective testing for Self-driving cars?
• How to address the Reality Gap problem when testing UAVs?
• Context: Cyber-physical Systems (CPSs)
The COSMOS Project has
received funding from
the European Union’s
Horizon 2020 Research
and Innovation
Programme under grant
agreement No. 957254.

Context
“My main research goal is to conduct industrial research, involving both industrial and
academic collaborations, to sustain the Internet of Things (IoT) vision of future "smart cities”,
with millions of smart systems connected over the internet, and/or controlled by complex
embedded software implemented for the cloud."
4

2) Artificial
Intelligence (AI) 3) DevOps, IoT,
Automated Testing (AT)
1) Cyber-physical Systems
Next
10-15 Years (and beyond)
Context
“My main research goal is to conduct industrial research, involving both industrial and
academic collaborations, to sustain the Internet of Things (IoT) vision of future "smart cities”,
with millions of smart systems connected over the internet, and/or controlled by complex
embedded software implemented for the cloud."
5

“Emerging Cyber-physical Systems (CPS) will play a crucial role in the quality of
life of European citizens and the future of the European economy”
Context
• CPS relevant sectors:
• Healthcare
• Automotive
• Water Monitoring
• Railway
• Manufacturing
• Avionics
• etc.
MEDICAL DELIVERY
FOOD DELIVERY
• Avionics
6

Background
First aerodynamic
fl
ight on another planet. Landed with Perseverance rover on 18 February 2021
SPACE EXPLORATION

9
UAVs
“But do we have, today UAVs, that would autonomously
map the disaster area at the Fukushima nuclear power
plant or spot the location of people stranded and isolated
after such disaster?”
Fukushima disaster
Unmanned Aerial Vehicles (UAVs) - a specific case of “CPSs”
Problem Statement (1)

• -
• Our (Software Engineering) view of DevOps and AI for IoT systems:
• DevOps and Continuous Delivery (CD): Whats is it?
• Present, Challenges, and Opportunities
• Relevant Research Questions
• Arti
fi
cial Intelligence (AI) and Testing Automation:
• User-oriented Testing Automation
“We all recognize the relevance and capacity of contemporary cyber-
physical systems for building the future of our society, but ongoing research
in the
fi
eld is also clearly failing in making the right countermeasures to
avoid that CPS usage a
ff
ects human being safety”. In
“Self-driving Uber kills Arizona
woman in first fatal crash involving
pedestrian”
“Swiss Post drone
crashes in Zurich
Problem Statement (2)
“A simple software update was
the direct cause of the fatal
crashes of the Boeing 737”
10

Question:
What are the main Challenges of
Testing Cyber-physical Systems?
11
: XXXXXXX

• -
• Our (Software Engineering) view of DevOps and AI for IoT systems:
• DevOps and Continuous Delivery (CD): Whats is it?
• Arti
fi
cial Intelligence (AI) and Testing Automation:
• User-oriented Testing Automation
“Self-driving Uber kills Arizona
woman in first fatal crash involving
pedestrian”
Challenges
“A simple software update was
the direct cause of the fatal
crashes of the Boeing 737”
Challenge 1: Observability, testability, and predictability of the behavior
of emerging CPS is highly limited and, unfortunately, their usage in the real
world can lead to fatal crashes sometimes tragically involving also humans
12

Research Challenges and Opportunities
As reported by National Academies:
[“A 21st Century Cyber-Physical Systems Education”]
“today's practice of IoT system design and
implementation are often unable to support
the level of ``complexity, scalability, security,
safety, […] required to meet future needs”
13

“The main problem is that contemporary
development methodologies for CPS need to
incorporate core aspects of both systems and
software engineering communities, with the
goal to explicitly embrace and consider the
several direct and indirect physical effects of
software”
[“Complexity challenges in development of cyber-physical systems”]
(Martin Törngren, Ulf Sellgren Pages 478-503)
14
Crash of
Boeing 737

“As identi
fi
ed by agile methodologies, the
development of modern/emerging systems
(e.g., e-health, automotive, satellite, and IoT
manufacturing systems) should evolve with
the systems, ``as development never ends”
15
Tools
software”
Crash of
Boeing 737

These concepts are closely related to DevOps and
Arti
fi
cial Intelligence technologies, and several
researchers and practitioners advocate them as a
promising solutions for the development,
maintenance, testing, and evolution of these
complex systems
16
“As identi
fi
ed by agile methodologies, the
development of modern/emerging systems
(e.g., e-health, automotive, satellite, and IoT
manufacturing systems) should evolve with
the systems, ``as development never ends”
Tools
software”
Crash of
Boeing 737

Challenge 1: Observability, testability, and
predictability of the behavior of emerging
CPS is highly limited and, unfortunately,
their usage in the real world can lead to fatal
crashes sometimes tragically involving also
humans
Challenge 2: Contemporary DevOps and
AI practices and tools are potentially the
right solution to this problem, but they are
not developed to be applied in CPS
domains
These concepts are closely related to DevOps and
Arti
fi
cial Intelligence technologies, and several
researchers and practitioners advocate them as a
promising solutions for the development,
maintenance, testing, and evolution of these
complex systems

Traditional DevOps Pipeline
18
“General lack of DevOps solutions supporting the physical dimension of CPS…”
Lint

Sebastiano Panichella Sajad Khatiri
Christian Birchler
COSMOS:
DevOps for Complex Cyber-physical Systems
https://www.cosmos-devops.org/ https://twitter.com/COSMOS_DEVOPS https://lnkd.in/eUVeaYaz

COSMOS Vision
■ Develop novel DevOps tools,
methodologies, and techniques that enable
effective, continuous development and
evolution of CPS
■ Increase the level of reliability,
dependability, trustworthiness, and
adaptability of CPS
■ Delivers proven DevOps advantages and
benefits to Europe’s CPS development
community

Three Methodological Pillars
22
KPIs
Scientific and Technological
Foundations of COSMOS
Empirical Validation of
COSMOS Innovations
DevOps Technological
Foundations of COSMOS

Industrial Use Cases
AVIATION
E-HEALTH
WATER MONITORING
SATELLITES
AUTOMOTIVE
RAILWAYS
DRONES SELF-DRIVING CARS
Reference Use Cases
24
COSMOS Use Cases

25
COSMOS Use Cases
Sajad Khatiri
DRONES

26
COSMOS Use Cases
Sajad Khatiri
DRONES
Without
Obstacle
With
Obstacle

27
Innovation Area 1: DevOps Pipelines for CPS
WP3: Methodology for Setting-Up and Maintaining
COSMOS DevOps Pipelines
■ CI/CD Antipatterns Identification for CPS
■ Definition of a DevOps-based Methodology to Support
the Development of Self-Adaptive CPS
■ COSMOS Pipeline Optimization
COMPONENTS

28
Innovation Area 2: V&V and Security Assessment of DevOps pipelines
COMPONENTS
WP4: V&V and security assessment of COSMOS DevOps
pipelines
■ Development of Automated Techniques for Software Testing
for CPS
■ Development of Run-time Verification Techniques for
Checking and Diagnosing CPS Executions
■ Development of Solutions for Detecting Security
Vulnerabilities in CPS

29
Innovation Area 3: Tools for High Quality CPS Software Evolution
WP5: Development of Tools to support High Quality CPS
Software Evolution
■ Design and Development of Refactoring Framework
for Secure and Reliable CPS
■ Development of Test Case Generation Tools for Rapid
DevOps Iterations
■ Development of Tools to support User-oriented
Maintenance and Testing
COMPONENTS

Innovation Area 4: Tools for Monitoring, Self-healing and Self-adaptability of CPS
WP6: Development of Tools to support Monitoring, Self-
healing, and Self-adaptability of CPS in the Field
■ Development and Assessment of CPS Change &
Behavioral Models
■ Developing AI-based Solutions to Support Two-speed
DevOps Cycles for CPS
■ Automated Quality Assessment and Monitoring of CPS
in the Field
■ Development of AI-based Solutions to Increase CPS
Self-adaptability to Diverse Contexts
COMPONENTS

Outline
32
and Innovation

33
What types of bugs occur in open-source CPSs?
CPS
Bugs

34
Famous Software Failure in a CPS: The Case of Ariane 5

35
Bugs and Failures in CPS
CPS Bug is
“a
fl
aw in the hardware (not properly
handled by the software), or an incorrect
interaction between the software and
hardware components leading to a CPS
misbehavior’’
A CPS bug can manifest as a CPS failure,
which makes a CPS unable to deliver its
required functionality or not ful
fi
lling some
non-functional properties
Properties

36
Bugs in the PX4 Project
https://github.com/PX4/PX4-Autopilot/issues/8980
Px4 Issue 8980: Unsuccessful
fl
ight
“ Autopilot receiving noisy sensor-data…“

37
Bugs in the OpenPilot Project
Openpilot Issue 2103: A CAN bus error
https://github.com/commaai/openpilot/issues/2103
“ Software update on unsupported hardware devices…“

38
Question:
What types of bugs occur in open-source CPSs?
: XXXXXXX

39
Fiorella Zampetti, Ritu Kapur, Massimiliano Di Penta,
Sebastiano Panichella: An Empirical Characterization
of Software Bugs in Open-Source Cyber-Physical
Systems. Journal of Systems & Software (JSS).
What types of bugs occur in Open-source CPSs?
CPS
Bugs
1,151 closed issues sampled from
14 open-source CPS projects
CPS bugs taxonomy comprises
8 di
ff
erent high-level categories

40
Process for designing a taxonomy of bugs occurring in CPSs
from GitHub (Arduino, drones, robotics, automotive, etc.)

41
Process for designing a taxonomy of bugs occurring in CPSs
from GitHub (Arduino, drones, robotics, automotive, etc.)

42
Analyzed Projects
Fiorella Zampetti, Ritu Kapur, Massimiliano Di Penta, Sebastiano Panichella: An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical Systems. Journal of Systems & Software (JSS).
14 open-source CPS projects CPS Domains Issues
Closed
Issues
Bug-related
Issues

43
33% of the
bugs are
CPS-speci
fi
c
7
6
5
4
3
2
1
8
Grouped into
8 high-level categories
CPS Bug taxonomy:
22 different root causes

44
Hardware Bugs in Open-source CPSs
Energy
Faulty Sensors
Hardware failure
Hardware
Not Supported/Compatible
1

45
Hardware Bugs in Open-source CPSs
Energy
Faulty Sensors
Hardware failure
1
Bug #21033 in openpilot points out the
presence of a CAN bus error on a
speci
fi
c device (i.e., Rav4 Prime).
Main Findings:
Hardware-speci
fi
c bugs are peculiar to our taxonomy, and, unsurprisingly, all of
them are CPS-speci
fi
c.
Recognizing (and simulating) hardware failures has paramount importance in V&V.
Also, developers should take particular care of hardware compatibility, especially
for CPSs targeting multiple devices.
The interaction with the hardware makes particularly crucial the analysis of non-
functional properties such as performance, memory, and energy consumption.

46
Network & Interface Bugs in Open-source CPSs

47
Bug #4302 in Arduino, where there is a memory leak while
doing repeated connections to a server, causing the loss of around
8KB for each connection.
Bug #6546 in PX4-AutoPilot that has been inherited from the third-
party library being used while interfacing with GPS (dealing with GPS
‘‘jamming’’ that has already been reported as an issue in the library aimed at supporting
the Intel Aero Platform)

48
Main Findings:
Networking plays a paramount role for CPSs
and can be the origin of bugs.
The CPS infrastructure should include
network monitors and V&V techniques may
contemplate CPS misbehavior caused by
network-speci
fi
c aspects.
Main Findings:
Interfacing bugs are challenging for
developers coping with CPSs, and testing
e
ff
orts should focus on them.

49
Algorithmic Bugs in Open-source CPSs
Bug#2620) ArduPilot where the barometer sensor
in a speci
fi
c condition is not handled by the application:
“`the barometer altitude became NaN [...] but the EKF
probably continued to use the barometer altitude…”
Barometer
Bug #801 in ArduPilot related to the setting of the vertical acceleration:
‘‘Vehicle was not reaching target climb or descent rate because of
incorrectly defaulted acceleration’’.
Acceleration

50
Algorithmic Bugs in Open-source CPSs
Bug#2620) ArduPilot where the barometer sensor
in a speci
fi
c condition is not handled by the application:
“`the barometer altitude became NaN [...] but the EKF
probably continued to use the barometer altitude…”
Barometer
Bug #801 in ArduPilot related to the setting of the vertical acceleration:
‘‘Vehicle was not reaching target climb or descent rate because of
incorrectly defaulted acceleration’’.
Acceleration
Main Findings:
Algorithmic bugs in CPSs tend to be similar to those occurring in other types of
software systems.
Therefore, existing mutants taxonomies can be used to seed some representative
faults.
However, the way failures manifest (e.g.,
fl
aky e
ff
ects on the hardware or
actuators) can make these bugs more subtle to detect...

51
Question:
What types of bugs are we still missing?
CPS
Bugs

52
CPS Safety Related Issues of UAVs
Andrea Di Sorbo, Fiorella Zampetti, Corrado A. Visaggio, Massimiliano Di Penta, and Sebastiano
Panichella: Automated Identification and Qualitative Characterization of Safety Concerns Reported in
UAV Software Platforms. Transactions on Software Engineering and Methodology.
What are the main Hazards and Accidents Emerging from Safety Issues
Reported in UAV Software Platforms?

53
RQ1: To what extent can machine learning models
automatically identify safety-related concerns in
issue reports of UAV software platforms?
RQ2: What are the main hazards and accidents emerging
from safety issues reported in UAV software platforms?
CPS Safety Related Issues of UAVs
RQ1: To what extent can machine learning models
automatically identify safety-related concerns in
issue reports of UAV software platforms?
RQ2: What are the main hazards and accidents emerging
from safety issues reported in UAV software platforms?

54
Co-occurrences
of hazard
categories
and accident
categories
Hazard Accident
Hazard categories and
corresponding occurrences in our
dataset of 273 safety-related
issues and pull requests.

55
DevOps Challenges for Dealing with CPS Bugs and Complexity
Interview-based methodology
Interviews’
transcripts
Card Sorting
Early feedback from
COSMOS partners
Bad (and good)
practices,
Challenges,
Barriers,
Mitigation
Analysis Triangulation
Validation outside COSMOS
(survey questionnaire)
Pull Requests (PRs) Mining
20 CPS related projects

56
Zampetti, Fiorella; Tamburri, Damian ; Panichella, Sebastiano;
Panichella, Annibale; Canfora, Gerardo; Di Penta, Massimiliano:
Continuous Integration and Delivery practices for Cyber-Physical
systems: An interview-based study. Transactions on Software
Engineering and Methodology.
Finding Overview:

57
Finding Overview:

58
Finding Overview:

59
Challenges vs. Mitigation Strategies
Mitigation Stategies Mitigation Stategies
Challenges Challenges

Challenges vs. Mitigation Strategies
Simulation Simulation
Challenges Mitigation Stategies
1
2
3

61
and Innovation
Outline

How to enable cost-effective testing
for Self-driving cars?
62

Tesla Car
Autonomous Driving Systems (ADSs)
Multi-sensing Systems:
• Autonomous systems capture surrounding
environmental data at run-time via
multiple sensors (e.g. camera, radar, lidar)
as inputs
• Processes these data with Deep Neural
Networks (DNNs) and outputs control
decisions (e.g. steering).
• Requires robust testing that
• creates realistic, diverse test cases
63

Traf
fi
c Sign Recognition (TSR)
Pedestrian Protection (PP) Lane Departure Warning (LDW)
Automated Emergency Braking (AEB)
Environmental Data Collection With ADSs Sensors
64

.
.
.
Driving
Actions
Sensors /
Camera
Autonomous
Feature
Actuator
65
Environmental Data Collection With ADSs Sensors
1. Pedestrians
2. Lane Position
4. Other Cars
3. Traf
fi
c Signs
DNNs • steering
• stop
• acceleration/
deceleration
• …

ADSs
66
Traditional DevOps Pipeline ADSs
“Manual Testing is still
Dominant…”

Testing Steps in ADSs
67
Requirements of Testing ADSs
• Generate Diversi
fi
ed Test
Inputs (or Scenarios)
• Evaluation based Failures
Detection
“Manual Testing is still
Dominant…”

68
npr, January 2022
Testing Autonomous Driving Systems

69
npr, January 2022 Reuters, September 2021

70
The New York Times, April 2021
npr, January 2022 Reuters, September 2021

71
class Triangle {
int a, b, c; //sides
String type = "NOT_TRIANGLE";
Triangle (int a, int b, int c){…}
void computeTriangleType() {
1. if (a == b) {
2. if (b == c)
3. type = "EQUILATERAL";
else
4. type = "ISOSCELES";
} else {
5. if (a == c) {
} else {
7. if (b == c)
8. type = “ISOSCELES”;
else
9. type = “SCALENE”;
}
}
}
Java Class Under Test (CUT)
@Test
public void test(){
Triangle t = new Triangle (1,2,3);
t.computeTriangleType();
String type = t.getType();
assertTrue(type.equals(“SCALENE”));
}
Test Case
Traditional Development Pipeline:
Coding v.s. Testing

72
class Triangle {
1. if (a == b) {
2. if (b == c)
else
} else {
5. if (a == c) {
} else {
7. if (b == c)
else
}
}
}
@Test
public void test(){
}
Test Case
Code Coverage:
The main
Quality Assessment
Criteria
Coding v.s. Testing

73
class Triangle {
1. if (a == b) {
2. if (b == c)
else
} else {
5. if (a == c) {
} else {
7. if (b == c)
else
}
}
}
@Test
public void test(){
}
Test Case
Coding v.s. Testing
Code Coverage:
Not Suf
fi
cient as
Quality Assessment
Criteria

Challenges of Testing ADSs
74
Challenge 1:
Code coverage
vs.
Scenario Coverage
Challenge 2:
Code coverage
&
CPU & Memory
consumption
Challenge 3:
Unit-Test
v.s.
System-level Testing

75
Stop
Testing Target: Feature Interactions Failures

77
COSMOS DevOps Testing Pipeline

78
World of Agile, 2018

79

80
Testing on-the-road
!

81
Testing on-the-road
!
Simulation-based Testing

82
Simulation-based Testing for ADSs

83

84

Simulation-Based Test Case
Simulator
(Matlab/Simulink)
Test Input
Test Output
85
Software
Under Test
(SUT)

Test Output: Safe and unsafe Tests in Autonomous Driving Systems
Test Output
Safe Behaviour
Unsafe Behaviour

87
Real-world testing:
➡Realistic
➡Trustworthy
➡Costly
➡Nondeterministic
Testing on-the-road
!

88
Real-world testing:
➡Realistic
➡Trustworthy
➡Costly
➡Nondeterministic
Testing on-the-road
!
Simulation is:
➡Cheaper
➡Faster
➡Less reliable
➡Complex CI/CD integration

Regression Testing
“Regression testing is the process of re-
testing software that has been modi
fi
ed”
P. Ammann and J. O
ff
utt
89

Regression Testing
“Regression testing is re-running
functional and non-functional tests to
ensure that previously developed and
tested software still performs after a
change”
Anirban Basu
90

Why Do We Need
Regression Testing?
91

A Typical Scenario
92
Class 1 Class 2 Class 3
Let’s add a new
functionality
Class N
Production Code
Test Code

A Typical Scenario
93
Now, let’s test
the new code
Class N+1 Class N+2
Class N
Great!
Production Code
Test Code

A Typical Scenario
94
Class N
Production Code
Test Code
Let’s push our
changes to the
CI server
Class N+1 Class N+2
Great!

A Typical Scenario
95
Class N
Why? We haven’t
changed Class 6
Class N+1 Class N+2
Production Code
Test Code

Why Regression Testing?
Many developers don’t want to believe it, small changes to one part of a
system often cause problems in distant (other) parts of the system
96

Why Regression Testing?
Change can have an unexpected impact on other part of the systems
97
Projects can be
large (e.g., OS)
Interconnected
Components
Software always
Evolves

Strategies
(Strategy 1) Retest all: this is the most straightforward approach and consists in
simply executing all the existing test cases in the test suite
However, as software evolves, the test suite tends to grow, which means it may
be prohibitively expensive to execute the entire test suite.
98

Regression Testing Techniques
99
Test Case
Selection
Re-test All
Test Case
Prioritization
High Med
Low
High
Med
High
Low
Low
Low

Regression Testing
100
Yoo et al. 2013
Selection
Prioritization

Regression Testing
101
Yoo et al. 2013
Minimization
Selection
Prioritization

Regression Testing
Birchler et al.,
SANER 2022.
github.com/ChristianBirchler/sdc-scissor
Birchler et al., TOSEM 2022
Birchler et al.,
EMSE 2023.

Regression Testing
103
Selection
Birchler et al.,
SANER 2022.
Birchler et al.,
EMSE 2023.

SDC-Scissor: Test Regression Pipeline
104
Test Generation Oracle
Training Prediction
github.com/ChristianBirchler/sdc-scissor

Soft-body Simulator: BeamNG
105

Test Selection
107
road_points=[(x,y,z),…]

Test Selection for Self-driving Cars
110
How does a test
look like?

Test Selection for Self-driving Cars
111
How does a test
look like?
{
‘road_points’: [
(x0,y0),
(x1,y1),
(x2,y2),
(x3,y3),
…
]
}
test.json

Test Inputs
Environment
Position
and speed
Road Shape
Traf
fi
c lights
position and
status
112
Weather
Leading Car:
- Initial Position
- Initial Speed
Car UnderTest:
- Initial Position
- Initial Speed

Feature Interactions Failures
113
Stop
Min Distance
50Km

Regression Testing with SDC-scissor: Test Generation
116
{
(x0,y0),
(x1,y1),
(x2,y2),
(x3,y3),
…
]
}
test.json
{
(x0,y0),
(x1,y1),
(x2,y2),
(x3,y3),
…
]
‘test_outcome’: ‘PASS’
}
test.json
{
‘road_points
’: [
(x0,y0),
(x1,y1),
(x2,y2),
(x3,y3)]
‘test_outc
ome’:
‘PASS’
}
test.json
{
‘road_points
’: [
(x0,y0),
(x1,y1),
(x2,y2),
(x3,y3)]
‘test_outc
ome’:
‘PASS’
}
test.json
{
‘road_points
’: [
(x0,y0),
(x1,y1),
(x2,y2),
(x3,y3)]
‘test_outc
ome’:
‘PASS’
}
test.json
{
‘road_points
’: [
(x0,y0),
(x1,y1),
(x2,y2),
(x3,y3)]
‘test_outc
ome’:
‘PASS’
}
test.json

Regression Testing with SDC-scissor: Test Outcome Prediction
117
dist, turns, angle, length,...
194, 5, 94,286, 99234,2,39
194, 5, 94,286, 99234,2,39
194, 5, 94,286, 99234,2,39
194, 5, 94,286, 99234,2,39
194, 5, 94,286, 99234,2,39
194, 5, 94,286, 99234,2,39
194, 5, 94,286, 99234,2,39
194, 5, 94,286, 99234,2,39
194, 5, 94,286, 99234,2,39
194, 5, 94,286, 99234,2,39
features.csv

Dataset Summary
119
More than 10,000 evaluated test scenarios in simulation!
Distribution of safe and unsafe scenarios
0%
25%
50%
75%
100%
BeamNG AI cautious BeamNG AI moderate BeamNG AI reckless Driver.AI
Safe Unsafe

SDC-Scissor Cost-effectiveness
𝑆
𝑖
𝑚
𝑢
𝑙
𝑎
𝑡
𝑖
𝑜
𝑛
𝑡
𝑖
𝑚
𝑒
𝑏
𝑦
𝑟
𝑎
𝑛
𝑑
𝑜
𝑚
𝑠
𝑒
𝑙
𝑒
𝑐
𝑡
𝑖
𝑜
𝑛
𝑆
𝑖
𝑚
𝑢
𝑙
𝑎
𝑡
𝑜
𝑛
𝑡
𝑖
𝑚
𝑒
𝑏
𝑦
𝑆
𝐷
𝐶
−
𝑆
𝑐
𝑖
𝑠
𝑠
𝑜
𝑟
Dataset
and ML
model
120
We spend 41% less computational time to
detect the same number of faults!
57% < F1-score < 97%
Finding:
Logistic and Naïve Bayes
classi
fi
ers save the most time.

Regression Testing
121
Minimization
Selection
Prioritization

Regression Testing
122
Minimization
Selection
Prioritization
Birchler et al.,
TOSEM 2022

Test case prioritization: executing the available test cases in a speci
fi
c order
that increases the likelihood of revealing regression faults earlier.
The third strategy does not involve selection of test cases, and it assumes
that all the test cases may be executed in the speci
fi
ed order, but that testing
may be terminated at some arbitrary point during the testing process.
123
Test Suite
Prioritization
Procedure
T1 T2 T3 T4
Ordered Test Suite
T4 T2 T1 T3
Test Case Prioritization

Black-Box Test Case Prioritization
124
@Test
public void testTriangle_invalid2() {
assertEquals(Triangle2.Type.INVALID,
Triangle2.triangle(-5,1,3));
}
@Test
public void testTriangle_isoscele() {
assertEquals(Triangle2.Type.ISOSCELE,
Triangle2.triangle(3,3,4));
}
@Test
assertEquals(Triangle2.Type.ISOSCELE,
}
@Test
assertEquals(Triangle2.Type.SCALENE,
}
Input diversity:
T1
T2
T3
T4
T1 T2 T3 T4
T1 - 5.91 6.40 12.40
T2 5.91 - 1.41 8.54
T3 6.40 1.41 - 8.30
T4 12.40 8.54 8.30 -
Euclidean distance between
input vectors of T4 and T2

Our Intuition
127
Time needed to run the whole test suite
(Only the simulation-based tests)

Measuring the Road Similarity
128
Road Features

Measuring the Road Similarity
129
Road Features
We extract these road features from
simulation-based tests
Roads with more diverse set of
features correspond to more diverse
test case inputs
Diversity is measure using the
Euclidean distance between two
feature vectors

Multi-Objective Genetic Algorithms
We can use Genetic Algorithms to search for optimal
test case permutation/prioritization of the test case
130
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T4 T2 T8 … Tn
We can encode a solution as a permutation of n test cases
In this example, T4 is executed before T2 and after T1

We need search objectives to measure the quality of
each solution. We have two objectives:
131
Initial Population
Selection
Crossover
Mutation
End?
YES NO
Objective 1: the sum of the
distances between each pair
of consecutive test cases ti
and ti-1 in the permutation
Objective 2: the sum of the
(past) execution cost for all
test cases. The cost of the test
ti is divided by its position i in
the permutation

New solutions (o
ff
spring) are generated by recombining
the genes of two parent solutions
132
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T4 T2 T5 T6 T3
T3 T1 T2 T6 T4 T5
Parent solutions

New solutions (o
ff
133
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T4 T2 T5 T6 T3
T3 T1 T2 T6 T4 T5
Parent solutions
T1 T4 T2
O
ff
spring

New solutions (o
ff
134
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T4 T2 T5 T6 T3
T3 T1 T2 T6 T4 T5
T1 T4 T2 T3 T6 T5
Parent solutions O
ff
spring

New solutions (o
ff
135
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T4 T2 T5 T6 T3
T3 T1 T2 T6 T4 T5
Parent solutions
T1 T4 T2 T3 T6 T5
O
ff
spring 1
T3 T1 T2 T4 T5 T6
Offspring 2

The newly generated permutation will be further
changes by applying random small mutation
136
O
ff
spring
T1 T4 T2 T3 T6 T5
Mutated O
ff
spring
Initial Population
Selection
Crossover
Mutation
End?
YES NO
T1 T6 T2 T3 T4 T5
The swap mutation operator randomly swaps two genes in a
(offspring) solution

(Baseline) Single-objective and Multi-Objective Test Prioritization
137
Multi-Objective Test Prioritization
Single-objective

Our Results
138
We compare the
fi
nal solution produced by GA
against two baselines:
• Random prioritization (average of 100K
randomly generated permutations)
• Greedy algorithm, which build a permutation
by greedily selects the test cases that is the
more diverse from the previously selected
one
Safe Driving Style

Cost-effectiveness with
139
Safe Driving Style Aggressive Driving Style Driving Style of a Different AI

Cost-effectiveness
140
SO-SDC-Prioritizer
MO-SDC-Prioritizer
Greedy

How to go from the Simulated world
to the Physical World when Testing Self-driving cars?
141
Simulated World Physical World
Simulation-based Testing Testing on-the-road
!

Application to AICAS (Automotive) Use Case in COSMOS
142
Birchler et al.,
EMSE 2023.
Simulated World Physical World

ECU
BCM
Instr.
Cluster
ABS
Airbag
HVAC
CAN Bus
143

CAN Bus
144
ECU
BCM
Instr.
Cluster
ABS
Airbag
HVAC
Traditionally
Multiple connections between each control unit to
exchange data
- Complicated / heavy cable harness
- Error prone
- Expensive

CAN Bus
145
Control Area Network
Protocol used for communication between control
units over a shared network
Used in
- Cars
- Robots
- Automated Production Facilities
- Etc
ECU
BCM
Instr.
Cluster
ABS
Airbag
HVAC

CAN Bus
146
Control Area Network
One shared network between all controllers.
Only 3 cables necessary
All stations can publish and receive messages
ECU
BCM
Instr.
Cluster
ABS
Airbag
HVAC
CAN BUS

CAN Bus
147
ECU
BCM
Instr. Cluster Airbag
HVAC
ABS
GND
CAN LOW
CAN HIGH

CAN Bus – How it works
148
Data is transmitted trough voltage changes from
recessive to dominant state
CAN Low and High transmit same signal but with
opposite voltage difference:
- Positive difference for CAN High
- Negative difference for CAN Low
Signal can be decoded against GND and against
each other.

CAN Bus – Message
149
Message Frame contains
- ID: Denoting the frame
- DATA: Data that should be transmitted
- Error Checking and some other protocol fields
All stations receive all messages and decide on the
relevance

CAN Database
fi
le DBC
150
File containing templates for all messages
Each frame consist of:
- ID
- Name
- Multiple Signal Values
Mostly proprietary with some shared standards

Can Bug Targeted Goal
151
Many control units communicating with each other
using CAN Bus
Testing the ability to process messages
Currently: Messages are generated manually
- Takes a lot of time
- Results depend on input
- Gets more complicated over time
ECU
BCM
Instr.
Cluster
ABS
Airbag
HVAC

152
DATA
Initiates
Collects
Generates
CAN Messages
Use the data created during
simulations and create CAN
Messages from that data
+ Data is generated anyways
+ Data should be realistic
+ More messages

Proposed Solution
153
Questions:
- What data do we get from the simulator?
- How do we get the data from the simulator?
- How to support multiple simulators and DBC
files?
- How to generate CAN Messages?
Simulator
CAN Bus Handler
[Python]
Output
needs
produces

Selected Simulator
154
Soft body simulator
Mainly used in academia
Restricted Simulation of
surroundings
Simulates all critical
components of the car:
- Engine
- Transmission
- Brakes
- Etc.

What data do we get from the simulator?
155

156
CAN – BUS Handler
Test-Runner
What data do we get from the simulator?

157
Can Bus Handler – Message Generation
Message Generation Steps:
(1) Test Runner starts a simulation and initializes
the CAN Bus Translator
(2) Periodically send the current simulation data to
the Translator
(3) For all frames in the DBC:
(1) Translate the values using the DBC Map
(2) Create the CAN Message using the template
(3) Send the Message to the Output

158
Can Bus Handler – Integration Results

159
Can Bus Handler – Result
Achievements:
• Working proof of concept
• Ability to generate “real time” CAN
messages
Next Steps:
• Validate the fault detection of
generated messages
• Support for other Simulators

160
and Innovation
Outline

161
How to address the Reality Gap problem when
testing UAVs?

Field Testing
o Reliable
o Not Reproducible
o Limited Test Scenarios
o Expensive
o Time Consuming
o Unsafe
162

o Reproducible
o Scalable & Automatable
o Affordable
o Safe
o Reliable Test Results?
o Reproducing Real-world Bugs?
o Generating Realistic Test Cases?
163

164
UAV Test Case Generation in the Neighborhood of Real Flights
Sajad Khatiri, Sebastiano Panichella, Paolo Tonella: Simulation-based Test Case Generation for Unmanned Aerial Vehicles in
the Neighborhood of Real Flights. International Conference on Software Testing, Verification and Validation. (ICST 2023)
▪Unsafe / Misbehavior

SURREALIST
Generate Realistic and Effective simulated test cases
1. Systematically Replicate Field Tests in simulation
2. Systematically Generate Challenging, but similar test cases
165

UAV System Test
UAV Configurations
Autopilot Parameter
Config Files (mission plan)
Environment Configurations
Weather Condition
Surrounding Objects
Runtime Commands
Radio Controller Commands
Starting Mission
Expected Behavior
Flight Trajectory
Safety Requirements
166

1. Flight Replication
Systematically Replicate field tests in simulation
• Find optimal drone/environment configurations
• Replicate a Specific Logged Behavior
• Minimizing differences in real vs simulation records
167
UAV Config.
Env. Config.
Commands
Expectation
Replicated Test

Given the flight log of an autonomous real-world flight,
Can we place an obstacle in the simulation environment
to faithfully replicate the flight’s trajectory?
Evaluation Scenario
168

3×3×3 box
Seed Solution
169
Distance: DTW (Dynamic Time Warping)

Move (∆x, ∆y)
Resize (∆l, ∆w, ∆h)
Rotate (∆r)
Mutations
170

Obstacle Move (∆y = +4)
Obstacle Move (∆y = -4)
Mutations
171

Adapt the mutation
for next round
Mutations
Obstacle Move (∆y = +4)
172

Final Solution
173
o Obstacle was moved 2m up, rotated 30˚and made 2.8m taller
o Almost identical replication of the flight trajectory
o Less than 75cm max distance
Main Findings

2. Test Generation
Systematically Generate Challenging but similar test cases
• Manipulate drone/environment configurations
• According to a predefined Difficulty measure
174
UAV Config.
Env. Config.
Commands
Expectation
Generated Test

Given a simulated test case for autonomous UAV flight,
Can we generate test cases that violate safety distance
to obstacles by placing an additional obstacle?
175
2. Test Generation
Evaluation Scenario

Same size box
Seed Solution
d1
d2
Distance = 2×d1 + d2
176
2. Test Generation
Move (∆x, ∆y)

Obstacle Move (∆x = -8)
Obstacle Move (∆x = -4)
Mutations
177
2. Test Generation

Final Solution
178
2. Test Generation
o Obstacle was moved 8m to the left and 1m up
o Final test revealed a nondeterministic behavior
o A critical and reproducible bug in PX4 which leads to crashes
(reported to PX4 developers)
Main Findings

Evaluation Summary
• The information available in the flight logs allows searching
for optimal test properties that faithfully replicate UAV
flight trajectories in simulation.
RQ1 [Flight Replication]
• Modifying a simulation-based test case allows generating
challenging test cases that can expose the UAV to unsafe
behaviors or even crashes.
RQ2 [Test Generation]
179

Final Solution
180
Generated failing
and flaky test
cases
2. Test Generation

Final Solution
181
Almost identical
replication of the
flight trajectory

Ongoing Research and Application to GMV (Avionic) Use Case in COSMOS
182
“Comparing Simulation and Field tests
using execution logs”
Testing Path Planning Systems
of airplanes considering
Simulated
World
Physical
World
V.S.

Summary
183
• Context: Cyber-physical Systems (CPSs) The COSMOS Project has
and Innovation
CPS
Bugs

WP7: Integration and Validation
WP7: Integration and Validation of COSMOS Solutions on
Industrial Use Cases
■ Task T7.1: Platform Integration and Customization for Use
Cases
■ Task T7.2: Design of Evaluation Methodology
■ Tasks T7.3 - 7: COSMOS Evaluation on 5 Use cases
CPS Use Cases

Targeted Impacts (1)
■ Industrial Impacts
⧫ Decreasing percentage of changes that result in CPS failure
⧫ Reducing CPS test execution time and computational resource consumption
⧫ Replacing manually generated tests with automated CPS test coverage
⧫ Improving test effectiveness through tests able to discover more bugs
⧫ Reducing
⧫ number of security vulnerabilities in CPS
⧫ component integration and deployment time
⧫ time to implement a change and make updated CPS operational
⧫ downtime when deploying new CPS hardware or software

Targeted Impacts (2)
■ CPS DevOps Ecosystem
⧫ Project technologies available in open source with actions to build a European
community and ecosystem exploiting DevOps for CPS
■ Standardisation
⧫ Usage of existing industry standards and proposed new standards and
extensions to ensure “plug-n-play” of DevOps tools for CPS development
■ Academic impact
⧫ Partners produced publications and are contributing to educational content
and Innovation

Thanks for the Attention!
• Any Questions?
“Testing and Development Challenges for
Complex Cyber-Physical Systems:
Insights from the COSMOS H2020 Project”
20 April, 2023 - Ireland Co-located with ICST 2023
Sebastiano Panichella
Zurich University of Applied Sciences
https://spanichella.github.io/

Testing and Development Challenges for Complex Cyber-Physical Systems: Insights from the COSMOS H2020 Project

Recommended

Recommended

More Related Content

Similar to Testing and Development Challenges for Complex Cyber-Physical Systems: Insights from the COSMOS H2020 Project

Similar to Testing and Development Challenges for Complex Cyber-Physical Systems: Insights from the COSMOS H2020 Project (20)

More from Sebastiano Panichella

More from Sebastiano Panichella (20)

Recently uploaded

Recently uploaded (20)

Testing and Development Challenges for Complex Cyber-Physical Systems: Insights from the COSMOS H2020 Project