Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA

Transfer
Learning,
hBOA istance-‐based
bias,
nd
based he
hierarchical
BOA

Transfer
learning,
sin d So.
for istance-‐Based
Bias,
a and
t onierarchical
BOA
metric.
o.
D additively decomposable problems the
H a problem-specific distance However,
http://medal-lab.org
note that the framework can be applied to many other model-directed optimization techniques and the

Martin Pelikan function γ canMark W. in many other ways. To illustrate this, we outline how this approach can be
be defined Hauschild Pier Luca Lanzi
Missouri Estimation of Distribution Algorithms extended to several other model-directed optimization techniques in section 6.
Missouri Estimation of Distribution Algorithms Dipartimento di Elettronica e Informazione
Laboratory (MEDAL) Laboratory (MEDAL) Politecnico di Milano
University of Missouri, St. Louis, MO 4 Distance-Based of Missouri, St. Louis, MO
University Bias Milano, Italy
E-mail: martin@martinpelikan.net E-mail: mwh308@umsl.edu
4.1 Additively Decomposable Functions E-mail: pierluca.lanzi@polimi.it
WWW: http://martinpelikan.net/ WWW: http://www.pierlucalanzi.net/
For many optimization problems, the objective function (fitness function) can be expressed in the form of
Background
an additively decomposable function (ADF) metric
for
ADFs

Distance
of m subproblems:
•  Model-‐directed
op-mizers
(MDOs),
such
as
es#ma#on
of
•  ADF
m
{Si}
are
subsets
of
variables.
3

distribu#on
algorithms,
learn
and
use
models
to
solve
f (X1 , . . . , Xn ) = fi (Si ), (5) 2.8 NK, n=50, k=5 100

with improved execution time
Multiplicative speedup w.r.t
{fi}
are
arbitrary
func-ons.

2.6 NK, n=60, k=5 90

Percentage of instances
2.4 NK, n=70, k=5
80
2.2
i=1
difficult
op-miza-on
problems
scalably
and
reliably.

2 70

CPU time
1.8 60

where (X1 , . . . , Xn ) are problem’s decision a
graph
for
ADF
with
one
node
pand ariable
b, X2 , . . . , Xn }
•  Create
variables, fi is the ith subfunction, er
v Si ⊂ {X1 y

1.6 50
1.4
1.2 40
base case (no speedup)

•  MDOs
o?en
provide
more
than
the
isolu-on;
they
provide
contributing to fi (subsets {Si } can overlap). Whileame
subproblem.
multiple
s the subset of variables
1 30

connec-ng
variables
that
are
in
the
s they may often exist
0.8 20
0.6 NK, n=5
0.4 10 NK, n=6

a
set
of
models
that
reveal
informa-on
about
the

0.2 0 NK, n=7
ways of decomposing the problem using additive decomposition, one would typically prefer decomposi-
•  Number
of
ean example, shortest
path
between
two
nodes
for
dges
along
consider the following objective function
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7
Kappa (strength of bias) Kappa (strength of

problem.
Why
not
use
that
informa#on
in
future
runs?
sizes of subsets {Si }. As
tions that minimize the
defines
their
distance;
for
disconnected
variables
the

(a) NK landscapes with neare
a problem with 6 variables:
distance
is
equal
to
the
number
of
variables.

3 2.6 SG 2D, n=144 (12x12)
3 100

2.8 NK, n=50, k=5 100 2.4 SG 2D, n=100 (10x10)
2.8 Kappa=10 Kappa=4
90


2.6 NK, n=60, k=5 90 2.2 SG 2D, n=642.6
(8x8) Kappa=8 Kappa=2

80

Average CPU speedup
NK, n=70, k=5 2 Kappa=6

Purpose

2.4 2.4
fexample (X1 , X2 , X3 , X4 , X5 , X6 ) = f1 (X1 , X2 , X5 ) + f2 (X3 , X4 ) + f3 (X2 , X5 , X6 ).
2.2
80
1.8 2.2 70

•  Can
use
other
distance
metrics
(e.g.
QAP
and
scheduling).

(multiplicative)
CPU time
2 70 1.6 60
2

CPU time
1.8 60 1.4 1.8 50
1.6 50 1.2 1.6
1.4 40 SG 2D, n=144
1

•  Combine
prior
models
with
a
problem-‐specific
distance
function, there are three subsets of variables, S1 = {X1 , X2 , X5 }, S2 = {X3 , X4 } and
40 1.4 speedup)
base case (no
In the above objective 1.2
1
0.8
30
20
0.8
0.6
1.2
1
30
20
SG 2D, n=100 (1
SG 2D, n=
0.6 0.4
NK, n=50, k=5 0.8 10

metric
to
solve
new
problem
instances
with
,increased
three subfunctions {fesults
each of which can be defined arbitrarily.
S3 = {X2 X5 , X6 }, and 1 , f2 , f3 },
10 0.2 k=5 0.6

Selected
is not fully determined by the order (size) of subproblems, but
r
0.4 NK, n=60, 0
0.2 0 0
NK, n=70, k=5 0.4
1 2 3 4 5 6 0.2 8 9 107 1 2 3 4 5 6 7
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 50 55 60 65 70
It is of note that the difficulty of ADFs
speed,
accuracy,
reliability.

Kappa (strength of bias) Kappa (strength of
Kappa (strength of bias) Kappa (strength of bias) Problem size (number of bits, n)

also by the definition of the subproblems and classes:
(a) NK landscapesfact, there exist a number of NP-complete
•  Problem
their interaction. In with nearest neighbors.
(b) 2D ±J Ising spin

•  Focus
on
hBOA
algorithm
and
addi-vely
decomposable

problems that can be formulated as ADFs with subproblems of order 2 or 3, Speedupsas MAXSAT for 3-CNF
Figure 9: such obtained on NK landscapes and 2D
2.6

•  easily define ADFs with lsubproblems of order n that can be solved
Nearest-‐neighbor
NK
andscapes.

2.6 SG 2D, n=144 (12x12) 100 Kappa=10 Kappa=4

2.4

2.4 SG 2D, n=100 (10x10) Kappa=8 Kappa=2

func-ons,
although
the
approach
can
be
generalized
to
hand, one may
90

2.2

Average CPU speedup
2.2
formulas. On the other 2
SG 2D, n=64 (8x8)
80 2
Kappa=6

(multiplicative)
1.8 70 1.8

by a simple bit-flip hill climbing in low-orderglasses
(2D
time.3D).

•  Spin
polynomial and

CPU time

other
MDOs
and
other
problem
classes.

1.6 60 1.6
1.4 50 1.4
1.2 1.2
40 SG 2D, n=144 (12x12)
4
1

•  Extend
previous
work
to
mainly
demonstrate
that
Variable Distances •  MAXSAT
for
transformed
graph
coloring.

base case (no speedup) 30 SG 2D, n=100 (10x10) 1
n=200, bias from n=150 base case n=200, bias from n=
0.8 3.5


0.6 20 SG 2D, n=64 (8x8) n=200, bias from n=200
3.5 0.8 (no speedup) n=200, bias from n=

4.2 Measuring for ADFs 0.4
0.2
10
0
3
0.6
0.4
3
2.5

•  Previous
MDO
runs
on
smaller
problems
can
ofe
udistance between two variablesertex
cover
for
random
graphs.
based on the work
•  Minimum
v of an ADF used in this paper is

CPU time

CPU time
0 2.5 0.2

The definition b a sed

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 64 100 144
2
Kappa (strength of bias) Kappa (strength of2bias) Problem size (number of bits, n)
1.5 1.5

to
bias
runs
on
larger
problems.
Hauschild and Pelikan (2008)• and Hauschild et smaller
problems
on
bigger
problems

Use
bias
from
al. (2012). Given an additively decomposable problem
(b) 2D ±J Ising spin glass base case (
of 1
1

0.5 0.5
with n variables, we define the distance9: Speedups obtainedvariables using 2D graph G without using local one node per
two on NK rom
p and a spin glasses of n nodes, search.
between the
bias
flandscapesroblems
of
the
same
size)

an
two variables Xi and Xj in theo

•  Previous
MDO
runs
for
one
problem
class
canybe
used
(compare
t
Figure 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7

variable. For same subset Sk , that is, Xi , Xjwith nearestwe create an edgecover,G= ∈ Sk , neighbors, (b) Minimum vertex in n Kappa (strength of bias) Kappa (strength of

to
bias
runs
for
another
problem
class.
the nodes Xi and Xj . See
fig. K
for an example

of

an

M= 200, k
=
5.

Spin
glass
(2D)

N 2 landscapes

ADF and the
corresponding graph. Denoting
n VC

(a) NK landscapes
between
by li,j the number of edges along the shortest path between Xi and Xj in G (in terms of the number of
4
n=200, bias from n=150 3.5 n=200, bias from n=150
2.8
2.6
2 n=400, bias from n=324
n=200, bias from n=150
1.8


w.r.t

3.5 n=200, bias from n=200 n=200, bias from n=200 2.4 n=400, bias from n=400
n=200, bias from n=200 1.6
1.8
edges), we define the distance between two variables as 3
3 2.2
2 1.4

Hierarchical
Bayesian
opAmizaAon
algorithm,
hBOA

1.6
1.8
2.5 1.2
CPU time

CPU time

CPU time

CPU time
2.5 1.6
1.4
2 1.4 base case 1
2 1.2
1.2 (no speedup)
li,j if a path between Xi and Xj exists, and 1.5 1 base case (no speedup) 0.8

Current
Selected
Bayesian

1.5

New

0.8
1
D(Xi , Xj ) = 1 (6)
1
base case (no speedup) 0.6 0.6

n otherwise. 0.8
0.4

popula-on
popula-on
network

0.4
popula-on
0.5 0.5 0.2
0.6
0
0.2

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 5 6 7 8 9 10
3 4 5 6 7 8 9 10
Kappa (strength of bias) Kappa (strength of bias) (strength of bias)
Kappa (strength of bias)
Fig. 2 illustrates the distance metric on a simple example. The above distance measure makes variables in
the same subproblem close to each Use
bias
from
another
problem
class
the distances correspond to
•  other, whereas for the remaining variables,
(a) NK landscapes with nearest neighbors, (b) Minimum vertex cover, n = 200, c = 2. (d) Minimum vertex cover, n = 200, c20 = (e) 3D ±
(c) 2D ±J Ising spin glass, 20 × = 4.

n = 200, k = 5. 400 343

the length of the chain of subproblemsK
landscapes

variables.easier)

obtained on allharder)
except for

N that relate the two

MVC
Figure 10:distance

isMVC
( test problems
( The Speedups

maximal for variables
1.8
2 n=400, bias from n=324 n=343, bias from n=216


n=400, bias from n=400 1.6 n=343, bias from n=343
1.8
that are completely independent (the value of a variable does not from problems of smaller size, compared toof the case with
influence the contribution the base other
3
1.6
1.2
1.4
CPU time

CPU time

Models from NK 4 Models from NK Models from NK3
1.4



2.8 Models from MVC, c=2 Models from MVC, c=2.0 Models from MVC, c=2.0
variable in any way).

2.6 Models from MVC, c=4 3.5 1
Models from MVC, c=4.0 Models from MVC, c=4.0
1.2 base case 2.5 speedup)
(no
2.4 0.8
3
2.2 1
17 2
Since interactions between problem variables are encoded mainly in the subproblems of the additive
CPU time

CPU time

CPU time

2 2.5 0.6
1.8 0.8

1.6 2 0.4 1.5
base case
problem decomposition, the above distance metric should typically correspond closely to the likelihood 1.4
1.2 base case (no speedup)2
1
0.6
3 4 5 6 7 8
1.5 0.2
9 10 base case (no speedup) 2
1 3 4 5 6 7 8
1
9 10
(no speedup)

1 1
of dependencies between problem variables in probabilistic models discovered by EDAs. Specifically, the 0.8
0.6
0.5
Kappa (strength 0.5
of bias)

•  Models
allow
hBOA
to
learn
and
use
problem
structure.
with respect to the 400

(d) 2D ±J Ising spin glass, n = 20 × 20 = (e) 3D ±J Ising spin glass, n = 7 × 7 × 7 =
0.4 0
variables located closer metric should more likely interact with each other. Fig. 3 illus-
343
1 2 3 4 5 6 7 8
9 10 1 2 3 4 5 6 7 8
9 10 1 2 3 4 5 6 7 8
9 10

trates this on two ADFs discussedNK landscapes thisnearest neighbors, (b) Minimum vertex cover, n = with 2. (c) Minimum vertex cover, n = 200, c = 4.
(a) later in with paper—the NK landscape 200, c = nearest neighbor interactions
•  To
build
models,
hBOA
uses
Bayesian
metrics
that
= Summary
of
results
(many
r

Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA

Similar to Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA (20)

More from Martin Pelikan

More from Martin Pelikan (8)

Recently uploaded

Recently uploaded (20)

Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA