Sensitivity Analysis of Checkpointing Strategies for Multimemetic Algorithms on Unstable Complex Networks

10th International Conference on Large-Scale Scientific Computations
Sensitivity Analysis of Checkpointing Strategies
for Multimemetic Algorithms on Unstable
Complex Networks
Rafael Nogueras Carlos Cotta
Departamento de Lenguajes y Ciencias de la Computación
Universidad de Málaga, Spain
LSSC 2015, Sozopol, 8-12 June 2015
Analysis of Checkpointing for MMAs on Unstable Complex Networks Universidad de Málaga 1 / 16

Introduction Model Description Experimental Analysis Conclusions
Parallel Computing & EAs
Use of parallel and distributed
models of EAs (GAs, MAs,
MMAs, etc.) to improve solution
quality and reduce computational
times.
The island model spatially
organizes populations into
partially isolated panmictic
demes.
island1
island2
island3
island4
migrants

Emergent Paralell Environments
Two emergent computational environments are offering new
opportunities to EAs:
I P2P networks: Equally privileged computing nodes carry out a
distributed computation without need for central coordination.
I Desktop Grids: Distributed networks of heterogeneous systems
which typically contribute computing cycles while they are
inactive (volunteer computing platforms).
Churn
The combined effect of multiple computing nodes leaving and
entering the system along time.

Scope
Some mechanism is required to deal with resource volatility.
1. Use of fault-tolerance strategies (e.g. restoration checkpoints).
2. Use of reactive policies to self-adapt the MMA.
Goal
Study EAs running on unstable computational environments with
SF/SW topologies:
I Use of restoration checkpoints with different frequencies.
I Impact on performance and comparison with random
strategies.

Network Topology
Scale-free Networks
I SF networks are often observed in many
natural processes.
I They feature a power-law distribution in
node degrees.
I We use the Barabási-Albert (BA) model.
I This model uses preferential attachment
to grow a network.

Network Topology
Small-world Networks
I SW networks have a very small
–O(log n)– average distances among
nodes.
I We use a variant of the
Barmpoutis-Murray (BM) model.
I This model uses a backtracking procedure
to build the largest clique leaving enough
links for the rest of the network.
I Each clique is connected using random
vertices in the first clique to make it more
resilient.

Instability
Algorithms must be executed on platforms with multiple
computing elements (processors)...
...but distributed platforms are prone to errors.
0 100 200 300 400 500 600 700 800 900 1000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
time
survival
probability
k=1
k=2
k=5
k=10
k=20
We assume node availability
follows a Weibull distributiona:
p(t1 | t0) = e
−
h
t1
β
η
−

t0
β
ηi
If the scale parameter η 1 the
hazard rate increases with time.
a
J Grid Comput, doi: 10.1007/s10723-014-9315-6, 2015

Coping with Instability
Computing in an unstable environment requires fault-tolerance.
Two strategies are considered:
I rand: initialization from scratch for nodes becoming available
again.
I checkpoint: periodical backups of each subpopulation in
external safe storage, intended to recover the last state when
a node is reactivated.
The rand strategy is simpler and has the advantage of boosting
diversity.

Coping with Instability
Cloud
Storage
Islandi
save state
save state
load state
save state
λ
fault
recovery
The checkpoint strategy
saves the previous progress
of the search.
It requires external safe
storage, i.e.,
I centralization
I overhead
The latter drawback could
be mitigated by tuning
parameter λ.

Benchmark and Settings
Parameters for island-based model:
I nι = 32 islands and µ = 16 individuals (at the beginning).
I m = 2 (SF/SW topologies).
Node deactivation/reactivation:
I shape parameter η = 1.5.
I scale parameters β = −1/ log(p) for p = 1 − (knι)−1,
k ∈ {1, 2, 5, 10, 20}.
Problems used:
I Deb’s trap function (concatenating 32 four-bit traps).
I HIFF HXOR functions (using 128 bits).
I MMDP (using 24 six-bit blocks).
25 runs @ 50,000 evaluations are performed for each problem,
algorithm, churn scenario, network topology and
λ ∈ {µ, 10µ, 100µ}.

Numerical Results
Approximation to the Optimum
Deviation from the optimum as a function of the churn rate.
0 0.2 0.4 0.6 0.8 1
0
10
20
30
40
50
60
70
80
90
100
1/k
deviation
from
optimum
(%)
rand
λ=16
λ=160
λ=1600
0 0.2 0.4 0.6 0.8 1
0
10
20
30
40
50
60
70
80
90
100
1/k
deviation
from
optimum
(%)
rand
λ=16
λ=160
λ=1600
Performance degrades with increasing churn rates but not in the
same way for the different λ values (degradation according to λ
increases).

Numerical Results
Statistical Analysis
i strategy z-statistic p-value α/i
1 λ = 160 2.598e+00 4.687e–03 5.000e–02
2 λ = 1600 7.015e+00 1.151e–12 2.500e–02
3 rand 8.747e+00 1.097e–18 1.667e–02
I Degradation is statistically significant according to Quade test
(p-value ≈ 0).
I The use of checkpoint with λ = µ is statistically superior
according to Holm test at α = 0.05 level.
I Strategies with less frequent snapshots are not capable of
dealing with churn.

Numerical Results
Evolution of Best Fitness
Evolution of best fitness on the TRAP function for different churn
rates with SF topology. (Left) k = 20 (Center) k = 5 and (Right)
k = 1.
1 2 3 4 5
x 10
4
14
16
18
20
22
24
26
28
30
32
evaluations
best
fitness
λ = 16
λ = 160
λ = 1600
1 2 3 4 5
x 10
4
14
16
18
20
22
24
26
28
30
32
evaluations
best
fitness
λ = 16
λ = 160
λ = 1600
1 2 3 4 5
x 10
4
14
16
18
20
22
24
26
28
30
32
evaluations
best
fitness
λ = 16
λ = 160
λ = 1600
As churn increases the difference in fitness is favourable to λ = µ.
Low λ values support better the degradation as churn increases.

Numerical Results
Evolution of Genetic Diversity
Evolution of genetic diversity on the TRAP function for different
churn rates with SF topology. (Left) k = 20 (Center) k = 5 and
(Right) k = 1.
1 2 3 4 5
x 10
4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
evaluations
entropy
λ = 16
λ = 160
λ = 1600
1 2 3 4 5
x 10
4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
evaluations
entropy
λ = 16
λ = 160
λ = 1600
1 2 3 4 5
x 10
4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
evaluations
entropy
λ = 16
λ = 160
λ = 1600
MMA has convergence problems as churn increases when λ is high.

Conclusions
Resilience is a key feature on unstable computational environments.
Policies in order to cope with the loss of information must be
introduced.
A possibility is the creation of periodical backups of the state of
the nodes to recover from failures.
Large churn rates require frequent backups to cope with node
volatility.
Ongoing work:
I Autonomous or self-adaptive approaches to react to churn.
I Use of purely local strategies.

Thank You!
AnySelf Project
Please find us in Facebook
http://facebook.com/AnySelfProject
and in Twitter
@anyselfproject

Sensitivity Analysis of Checkpointing Strategies for Multimemetic Algorithms on Unstable Complex Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sensitivity Analysis of Checkpointing Strategies for Multimemetic Algorithms on Unstable Complex Networks

Similar to Sensitivity Analysis of Checkpointing Strategies for Multimemetic Algorithms on Unstable Complex Networks (20)

More from Rafael Nogueras

More from Rafael Nogueras (6)

Recently uploaded

Recently uploaded (20)

Sensitivity Analysis of Checkpointing Strategies for Multimemetic Algorithms on Unstable Complex Networks