SlideShare a Scribd company logo
1 of 39
Download to read offline
Identification of unknown parameters and prediction
of missing values. Comparison of approaches
Alexander Litvinenko1,
(joint work with V. Berikov2,3, R. Kriemann4 and KAUST people)
Eurasian Conference on Applied Mathematics-2021
Novosibirsk, Akademgorodok, December 16-21, 2021
1RWTH Aachen, Germany, 2Novosibirsk State University,
3Sobolev Institute of Mathematics, 4MPI for Mathematics in the Sciences in Leipzig,
Germany
4
*
Analysis of large datasets and prediction
Goals: identify unknown parameters in a large dataset and
prediction
MLE deals with dense cov. matrix C of size 2, 000, 000 × 2, 000, 000
H-matrix rank
3 7 9 12
nu
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
3
2
4 2
9
2
4 2
17
2
2
4 2
7 3 2
18 17
10 10
2
5 2
8
2
4 2
10 3 3 2
32
14 10
18 10
36 28
3
6
2
4 2
14
2
4 2
5 2
4 2
14 18
14 18
2
4 2
8
2
5 3
19
2
4 2
6 3
6 3
65
32
19 21
18 18
29 29
61 61
2
5 3
7 3 2
5 2
18
3
6 3
6 3
17 17
12 12
3 2
7
2
4 2
2
12 3
2
4 2
32
17 12
12 12
29 29
2
5
3 2
7
2
4 2
12
2
5 3
10 10
19 12
3 3 2
10
2
5 2
5 2
4 2
119
65
32
10 15
10 19
27 27
54 54
118 118
2
5
3
4 2
15
2
5 3
11
3
6 3
15 17
10 10
2
4 2
8
2
4 2
10
3 2
3
21
14 10
7 7
33 27
2
4 3 3 2
7 3
8 7
14 7
2
3
8
2
4 2
7 3 2
10
2
5 2
59
21
8 18
8 12
21 29
59 54
3
6 3 2
5 2
12
2
5 3 2
18 12
11 11
3 2
7
3
5 2
11
2
5 3
25
9 9
16 11
30 29
3
3
9
3
4 2
6 3
9 16
9 16
3 2
4 2
7 3 2
16
3 2
7
2
4 2
121
119
59
25
18 16
10 10
25 34
59 59
119 118
80 119
2
4
2
4 2
9
2
4 2
10
3
4 2
19 10
15 10
2
5
3 2
7 3 2
15
3
4 2
5 2
31
15 15
16 15
32 34
3 2
4 2
4 2
15
2
4 3 2
5 2
15 16
15 16
2
5 3 2
12
3
6 3
20
3 2
7 3 2
6 3
58
27
18 18
9 9
31 31
66 67
2
5
3 2
6 3
9
2
4 2
12 9
18 9
3
6 3
12
3
4 2
9
2
4 2
27
12 16
12 19
27 31
3 2
3
6 3
16
2
5
2
4 2
6 3
16 20
8 8
2
4 2
7 3 2
7 3 2
8
2
3
115
58
30
15 8
15 8
28 28
57 57
119 119
2
5
2
4 2
2
15
2
4 2
6 3
9 9
15 15
3 3
9
2
5
3 2
7 3 2
27
9 16
9 11
30 28
3
5 2
5 2
11
2
5 2
16 11
13 11
3 3 2
7 3 2
13
2
5
2
4 2
48
25
5 5
17 13
23 23
58 57
2
5
2
4 2
5 2
6 3
5 8
5 15
2 3
8
3 2
2
6 3
25
8 15
8 9
25 23
2
5 3
6 3
7 3 2
9
2
4 2
15 9
23 9
3
3
6 3
15
2
4 2
5 2
10
2
5 2
90
122
115
48
32
15 15
15 17
33 33
48 60
115 119
81 234
56 89
3 2
7
2
4 2
15
3
4 2
7 3 2
15 17
6 6
2
5
2
5 3
11
2
5 3
6 3
30
11 6
19 6
30 30
3
5 2
11
3 2
7 3 2
5 2
11 13
11 17
2
4 2
5 2
13
3
5 2
6 3
65
30
13 17
13 17
30 30
61 60
2
4
3
6 3
5 2
21
2
4
2
4 2
12
3
6 3
20 20
14 14
2
4 2
8
2
4 2
3
14
2
5 2
4 2
32
17 14
15 14
29 29
2
4 2
8
2
4 2
15
2
5
2
5 2
10 10
17 15
2
5 2
10
3
6
2
4 2
5 2
125
75
32
10 18
10 19
32 29
69 61
125 134
2
5
3 2
6 3
18
3 2
6 3
8
2
4 2
17 17
18 19
2
4 2
8
2
4 2
17
3
6 3 2
6 3
39
17 19
17 18
25 25
3 2
7
2
4 2
11
2
4 3 2
18
3
6
3
6 3
10 10
15 15
3 2
3
10
2
4 3
5 2
65
30
10 15
10 12
35 25
69 69
2
4 2
9
2
4 2
12 2
2
5 2
18 12
14 12
2
5 2
10
2
4 3 2
14
3
6
2
4 2
30
15 14
16 14
30 35
2
4 2
6 3
15
3 2
4 2
5 2
15 15
15 16
2
4 2
5 2
2
15
3
6 3
8
2
4 2
3
227
111
61
31
15 21
15 16
24 24
50 50
116 116
81 224
3
6 3
9
2
4 2
16
2
5 2
6 2
19 16
5 5
2
5 3
8
2
4 2
5 2
23
16 5
7 5
27 24
3
6
2
5 2
7
2
3
10 7
16 7
3
4 2
10
3
6
3
5 2
54
23
10 17
10 11
23 25
61 50
2
2
7
2
5 3
11
2
5 2
13 11
12 11
3
2
5 2
12
2
4
2
4 2
29
13 12
13 12
27 25
2
5
2
5 2
15
2
5
3 2
7
2
4 2
15 18
9 9
2
5 2
5 2
3
9
2
4 2
111
54
33
18 9
12 9
22 22
54 59
110 110
3 2
7
2
4 2
6 3
12
2
5 3 2
9 9
13 12
2
4 2
9
3 2
6 3
33
9 13
9 13
22 22
3 2
6 3
9
2
4 2
15
2
4 2
6 3
12 12
10 10
2
2
6 3
10
2
4 2
49
23
7 7
12 10
26 22
55 59
3
7
3
2
4 2
4 2
7 8
7 16
3 2
8
3
6 3
6 3
23
8 15
8 11
23 26
3
2
7 3 2
11
2
4 2
3
12 11
15 11
3 2
5 2
12
2
4 2
6 3
8
2
4 2
We show how to:
1. reduce storage cost from 32TB to 16 GB
2. approximate Cholesky factorisation of C, its determinant, inverse
in 8 minutes on modern desktop computer.
3. make prediction
2
4
*
The structure of the talk
1. Motivation: improve statistical models, data analysis,
prediction
2. Identification of unknown parameters via maximizing Gaussian
log-likelihood (MLE)
3. Tools: Hierarchical matrices [Hackbusch 1999]
4. Matérn covariance function, joint Gaussian log-likelihood
5. Error analysis
6. Prediction at new locations
7. Comparison with machine learning methods
3
4
*
Identifying unknown parameters via MLE method
Given:
Let s1, . . . , sn be locations.
Z = {Z(s1), . . . , Z(sn)}>, where Z(s) is a Gaussian random field
indexed by a spatial location s ∈ Rd , d ≥ 1.
Assumption: Z has mean zero and stationary parametric covariance
function, e.g. Matérn:
C(θ) =
2σ2
Γ(ν)
 r
2`
ν
Kν
r
`

+ τ2
I, θ = (σ2
, ν, `, τ2
).
To identify: unknown parameters θ := (σ2, ν, `, τ2).
4
4
*
Identifying unknown parameters via MLE method
Statistical inference about θ is then based on the Gaussian
log-likelihood function:
L(C(θ)) = L(θ) = −
n
2
log(2π) −
1
2
log|C(θ)| −
1
2
Z
C(θ)−1
Z, (1)
where the covariance matrix C(θ) has entries
C(si − sj ; θ), i, j = 1, . . . , n.
The maximum likelihood estimator of θ is the value b
θ that
maximizes (1).
5
4
*
H-matrix approximations of C, L and C−1
43
13
28
14 34
10 5
9 15
44
13
29
13 35
14 7
5 14
50
14 62
15 6
6 15
60
15
31
17 44
13
14 6
7 14
3 13 6
5
16
12 8
7 15
3 15
34
11 31
16 59
13 6
7 12
58
15 51
16 7
6 16
31
15 39
12
36
14 40
15 6
6 17
32
14 40
10 7
7 10
41
16 35
5
13 8
6
9
10 7
7 9
3 10
1 5 6
6
6
8
8 7
8 8
3 9 7
7
9
8 6
7 7
3 9
1 6
47
12 51
15 7
7 15
38
17 34
17
34
15 32
13 7
7
10
9 7
6 9
3 10
56
14 46
14 7
8 12
64
15 57
9 6
6 8
16 8
7 14
3
9 5
5 9
9
7
15
16 6
5 14
2
9 6
6 8
54
16 30
10 5
7 8
32
15 33
14 5
8 16
34
14 35
9 6
7 9
34
15 31
17 6
7 15
60
15
34
14 28
15 6
7 13
29
13 35
16 54
2 1
1 2
14
12 6
9 12
2 12 6
9
8 6
6 8
16 9
6 13
3 16
1 1
1
2 1
1 2
1 6
1 1
1 6
1
1
2 1
1 3
7 6
6 9
15 6
7 14
2
9 6
6 8
8
5
8 5
7 9
14 6
8 15
2 14
1 1
1 1
3 1
1 2
56
16
31
12 32
12 6
10 14
55
16
35
14 43
15 8
7
9
5 8
4 9
3 10
41
16 28
15
30
16 38
14 7
6 17
36
11 38
8 8
6 9
40
17 36
13
13 7
7 15
3 14 8
6
12
16 6
6 13
2 16
60
17
29
14 35
11 6
6 17
54
15 67
14 6
6 16
36
14 39
18 59
14 6
7 17
52
14 58
6
9
9 8
6 8
3 10 7
8 17
1 5 5
7
6
15 7
9 13
1 5
61
13 59
13 6
6 13
37
16 38
15 58
13 5
4 16
31
13 33
18 59
12 6
7 12
74
15 63
15
15 7
7 17
3
9 6
6 9
7
7
9 6
6 8
15 7
7 14
1 14
51
15
42
13 37
13 7
6 14
59
16 54
14 8
6 16
36
14 37
19 57
14 7
6 10
52
13 65
43
13
28
13 34
10 5
9 15
44
13
29
13 35
14 7
5 14
50
14 62
14 6
6 14
60
15
31
16 44
13
14 8
7 14
3 13 5
5
17
11 7
7 15
2 15
34
11 31
16 59
13 6
7 12
58
15 51
16 7
6 16
31
15 39
12
36
14 40
15 5
6 17
32
14 40
10 7
7 10
41
17 35
5
13 8
6
9
10 7
7 9
3 10
1 5 5
6
5
9
8 7
8 8
3 9 7
7
10
8 5
7 7
4 9
1 5
47
12 51
15 7
7 15
38
16 34
16
34
15 32
13 8
7
12
8 6
6 9
3 10
56
14 46
14 8
8 12
64
15 57
11 6
6 8
15 9
7 14
3
10 5
5 9
7
7
16
16 7
5 15
2
9 6
6 9
54
15 30
10 5
7 8
32
14 33
13 5
8 16
34
14 35
9 6
7 9
34
15 31
17 5
7 15
60
15
34
13 28
14 5
7 13
29
13 35
15 54
1 1
1 2
14
12 6
9 11
2 12 5
9
9 5
6 8
16 10
6 13
3 16
1 2 1
1 2
1 6
1
1 6
1
1
2 1
1 2
8 5
6 9
15 8
7 14
3
9 6
6 8
8
5
9 5
7 9
14 7
8 15
2 14
1 1
1
2 1
1 1
56
15
31
12 32
12 6
10 13
55
16
35
14 43
15 8
7
10
5 9
4 9
3 10
41
16 28
9 4
8 7
30
16 38
14 7
6 17
36
11 38
8 8
6 9
40
17 36
13
13 6
7 15
3 14 8
6
13
16 6
6 13
1 16
60
16
29
14 35
11 8
6 17
54
15 67
13 5
6 16
36
14 39
17 59
14 6
7 17
52
14 58
6
10
9 8
6 8
3 10 7
8 16
1 4 5
7
5
15 6
9 13
1 5
61
13 59
12 6
6 13
37
16 38
15 58
13 5
4 16
31
12 33
18 59
12 5
7 12
74
14 63
14
15 8
7 18
2
10 6
6 8
5
7
11 6
6 8
15 6
7 14
2 14
51
15
42
13 37
12 8
6 14
59
15 54
13 8
6 15
36
13 37
19 57
15 6
6 10
52
13 65
H-matrix approximations of the exponential covariance matrix
(left), its hierarchical Cholesky factor L̃ (middle), and the zoomed
upper-left corner of the matrix (right), n = 4000, ` = 0.09,
ν = 0.5, σ2 = 1.
6
4
*
What will change after H-matrix approximation?
Approximate C by CH
1. How the eigenvalues of C and CH differ ?
2. How det(C) differs from det(CH) ?
3. How L differs from LH ? [Mario Bebendorf et al]
4. How C−1 differs from (CH)−1 ? [Mario Bebendorf et al]
5. How L̃(θ, k) differs from L(θ)?
6. What is optimal H-matrix rank?
7. How θH differs from θ?
For theory, estimates for the rank and accuracy see works of
Bebendorf, Grasedyck, Le Borne, Hackbusch,...
7
4
*
Details of the parameter identification
To maximize the log-likelihood function we use the Brent’s method
(combining bisection method, secant method and inverse quadratic
interpolation) or any other.
1. C(θ) ≈ e
C(θ, ε).
2. e
C(θ, k) = e
L(θ, k)e
L(θ, k)T
3. ZT e
C−1Z = ZT (e
Le
LT )−1Z = vT · v, where v is a solution of
e
L(θ, k)v(θ) := Z.
log det{e
C} = log det{e
Le
LT
} = log det{
n
Y
i=1
λ2
i } = 2
n
X
i=1
logλi ,
e
L(θ, k) = −
n
2
log(2π) −
n
X
i=1
log{e
Lii (θ, k)} −
1
2
(v(θ)T
· v(θ)). (2)
8
4
*
Error analysis
Theorem (1)
Let e
C be an H-matrix approximation of matrix C ∈ Rn×n such that
ρ(e
C−1
C − I) ≤ ε  1.
Then
|log det C − log det e
C| ≤ −nlog(1 − ε), (3)
Proof: See [Ballani, Kressner 14] and [Ipsen 05].
Remark: factor n is pessimistic and is not really observed
numerically.
9
4
*
Error in the log-likelihood
Theorem (2)
Let e
C ≈ C ∈ Rn×n and Z be a vector, kZk ≤ c0 and kC−1k ≤ c1.
Let ρ(e
C−1C − I) ≤ ε  1. Then it holds
| e
L(θ) − L(θ)| =
1
2
(log|C| − log|e
C|) +
1
2
|ZT

C−1
− e
C−1

Z|
≤ −
1
2
· nlog(1 − ε) +
1
2
|ZT

C−1
C − e
C−1
C

C−1
Z|
≤ −
1
2
· nlog(1 − ε) +
1
2
c2
0 · c1 · ε.
4
*
H-matrix approximation
ε accuracy in each sub-block, n = 16641, ν = 0.5,
c.r.=compression ratio.
ε |log|C| − log|e
C|| |
log|C|−log|e
C|
log|e
C|
| kC − e
CkF
kC−e
Ck2
kCk2
kI − (e
Le
L
)−1
Ck2 c.r. in %
` = 0.0334
1e-1 3.2e-4 1.2e-4 7.0e-3 7.6e-3 2.9 9.16
1e-2 1.6e-6 6.0e-7 1.0e-3 6.7e-4 9.9e-2 9.4
1e-4 1.8e-9 7.0e-10 1.0e-5 7.3e-6 2.0e-3 10.2
1e-8 4.7e-13 1.8e-13 1.3e-9 6e-10 2.1e-7 12.7
` = 0.2337
1e-4 9.8e-5 1.5e-5 8.1e-5 1.4e-5 2.5e-1 9.5
1e-8 1.45e-9 2.3e-10 1.1e-8 1.5e-9 4e-5 11.3
log|C| = 2.63 for ` = 0.0334 and log|C| = 6.36 for ` = 0.2337.
11
4
*
Boxplots for unknown parameters
Moisture data example. Boxplots for the 100 estimates of (`, ν,
σ2), respectively, when n = 32K, 16K, 8K, 4K, 2K. H-matrix with
a fixed rank k = 11. Green horizontal lines denote 25% and 75%
quantiles for n = 32K.
12
4
*
Time and memory for parallel H-matrix approximation
Maximal # cores is 40, ν = 0.325, ` = 0.64, σ2 = 0.98
n C̃ L̃L̃
time size kB/dof time size kI − (L̃L̃
)−1
C̃k2
sec. MB sec. MB
32.000 3.3 162 5.1 2.4 172.7 2.4 · 10−3
128.000 13.3 776 6.1 13.9 881.2 1.1 · 10−2
512.000 52.8 3420 6.7 77.6 4150 3.5 · 10−2
2.000.000 229 14790 7.4 473 18970 1.4 · 10−1
Dell Station, 20 × 2 cores, 128 GB RAM, bought in 2013 for
10.000 USD.
13
4
*
Prediction
Let Z = (Z1, Z2) has mean zero and a stationary covariance, Z1 -
known, Z2 unknown.
C =

C11 C12
C21 C22,

We compute predicted values
Z2 = C21C−1
11 Z1
Z2 has the conditional distribution with the mean value C21C−1
11 Z1
and the covariance matrix C22 − C21C−1
11 C12.
14
4
*
2021 KAUST Competition
We participated in 2021 KAUST Competition on Spatial
Statistics for Large Datasets.
You can download the datasets and look the final results here
https://cemse.kaust.edu.sa/stsds/
2021-kaust-competition-spatial-statistics-large-datasets
See more our details in https://www.eccomasproceedia.org/
conferences/thematic-conferences/uncecomp-2021/8027
15
4
*
Machine learning methods to make prediction
k-nearest neighbours (kNN): for each x find its k nearest
neighbors x1, . . . , xk with respect to some metrics, and set the
prediction ŷ = 1
k
Pk
i=1 yi , where yi is the observed responses xi . k
is determined by cross-validation procedure. We tried
k ∈ {1, . . . , 20}.
Random Forest (RF): a large number of decision (or regression)
trees are generated independently on random sub-samples of data.
The final decision for x is calculated over the ensemble of trees by
averaging the predicted outcomes. We tried ∈ {100, . . . , 150} tries.
16
4
*
Machine learning methods to make prediction
Deep Neural Network (DNN): fully connected neural network
Input layer consists of two neurons (input feature dimensionality),
and output layer consists of one neuron (predicted feature
dimensionality).
We examined few variants of FCNN architecture for {3, 4, . . . , 10}
hidden layers and 50-100 neurons in each hidden layer.
17
4
*
Datasets:
For all tests below we used datasets from the statistical
competition https://cemse.kaust.edu.sa/stsds/
2021-kaust-competition-spatial-statistics-large-datasets
The spatial domain is D := [0.0, 1.0] × [0.0, 1.0].
Each dataset includes 90% of training samples and 10% testing
samples, where prediction should be made.
4
*
Comparison of kNN, decision forest and FCNN
kNN method showed the best MC cross-validation results (over 100
runs) in comparison with other ML methods.
We found out that k = 3 neighbours is optimal for Test 2a
(moderate sample size 100K), and k = 7 for large Test 2b (1Mi
samples).
Trying different numbers of decision trees in the random forest, we
defined that an ensemble of 120 regression trees is optimal.
FCNN architecture: 7 hidden layers with 100 neurons in each layer.
#training epochs = 500, batch size = 10000.
We use tanh(·) activation function and Adam optimizer.
The average calculation time is
0.07 sec. for kNN,
12 sec. for RF
173 sec. for FCNN.
Because of its efficiency, we decided to run only kNN method for
larger datasets.
19
4
*
Prediction by kNN
Test-2b, dataset1: The yellow points at 900.000 locations were
used for training and the blue points were predicted at 100.000 new
locations. 20
4
*
Less accurate prediction by H-MLE method
H-MLE: Test 2b, dataset 1. The yellow points at 900.000 locations
were used for training and the blue points were predicted at
100.000 new locations.
21
4
*
Prediction by kNN
Test-2b, dataset2: The yellow points at 900.000 locations were
used for training and the blue points were predicted at 100.000 new
locations.
22
4
*
Prediction by the H-MLE method
H-MLE: Test 2b, dataset 2. The yellow points at 900.000 locations
were used for training and the blue points were predicted at
100.000 new locations.
23
4
*
Prediction by the H-MLE method
The yellow points at 90, 000 locations were used for training and
10, 000 values at new locations (blue points) were predicted.
H-MLE: Test 2a: datasets 1
24
4
*
Prediction by the H-MLE method
The yellow points at 90, 000 locations were used for training and
10, 000 values at new locations (blue points) were predicted.
H-MLE: Test 2a: datasets 2
25
4
*
Prediction by the H-MLE method
H-MLE: Test 1b, dataset 4. The yellow points at 90, 000 locations
were used for training and the blue points were predicted in 10, 000
locations.
26
4
*
Prediction by the H-MLE method
H-MLE: Test 1b, datasets 7. The yellow points at 90, 000 locations
were used for training and the blue points were predicted in 10, 000
locations.
27
4
*
RMSE error: comparison for all methods
Root Mean Square Error (RMSE)
RMSE =
v
u
u
t 1
nt
nt
X
i=1

Ẑ(si ) − Z(si )
2
, (4)
where Ẑ(si ) and Z(si ) are respectively the predicted and true
realization values at the location si in the testing dataset, and nt is
the total number of locations in the testing dataset.
Test 2a Test 2b
dataset 1 2 1 2
H-MLE 0.057 0.14 0.25 0.021
kNN 0.129 0.357 0.007 0.04
RF 0.226 0.607 — —
FCNN 0.243 0.74 — —
RMSE errors for H-MLE, kNN, RF, and FCNN methods.
28
4
*
Conclusion
I With H-matrices you can approximate Matérn covariance
matrices, Gaussian log-likelihoods, identify unknown
parameters and make predictions
I MLE estimate and predictions depend on H-matrix accuracy
I parameter identification problem has multiple solutions
I Investigated dependence of H-matrix approximation error on
the estimated parameters
I Each of ML methods needs fine-tuning stage to optimize its
hyperparameters or architecture.
I kNN is easy to implement and it shows promising results
4
*
Open questions and TODOs
I The Gaussian log-likelihood function has some drawbacks for
very large matrices
I How to skip/avoid redundant data?
I A good starting point for optimization is needed
I a “preconditioner” (a simple cov. matrix) is needed
I H-matrices become expensive for large number of parameters
to be identified
(All tests are reproducible
https://github.com/litvinen/large_random_fields.git
30
4
*
Literature
1. A. Litvinenko, R. Kriemann, V. Berikov, Identification of unknown
parameters and prediction with hierarchical matrices, UNCECOMP 2021
Conf. Proc., Uncertainty Quantification in Computational Sciences and
Engineering, M. Papadrakakis, V. Papadopoulos, G. Stefanou (Eds.), pp
129-144, https://2021.uncecomp.org, ISBN: 978-618-85072-6-5, 2021
2. A. Litvinenko, R. Kriemann, M.G. Genton, Y. Sun, D.E. Keyes, HLIBCov:
Parallel hierarchical matrix approximation of large covariance matrices
and likelihoods with applications in parameter identification, MethodsX 7,
100600, 2020
3. A. Litvinenko, Y. Sun, M.G. Genton, D.E. Keyes, Likelihood
approximation with hierarchical matrices for large spatial datasets,
Computational Statistics  Data Analysis 137, 115-132, 2019
4. M. Espig, W. Hackbusch, A. Litvinenko, H.G. Matthies, E. Zander,
Iterative algorithms for the post-processing of high-dimensional data,
Journal of Computational Physics 410, 109396, 2020
5. A. Litvinenko, D. Keyes, V. Khoromskaia, B.N. Khoromskij, H.G.
Matthies, Tucker tensor analysis of Matérn functions in spatial statistics,
Computational Methods in Applied Mathematics 19 (1), 101-122, 2019
6. M. Espig, W. Hackbusch, A. Litvinenko, H.G. Matthies, E. Zander,
Efficient analysis of high dimensional data in tensor formats, Sparse Grids
and Applications, 31-56, Springer, Berlin, 2013 31
4
*
Literature
7. Berikov V., Litvinenko A. (2021) Weakly Supervised Regression Using
Manifold Regularization and Low-Rank Matrix Representation. In:
Pardalos P., Khachay M., Kazakov A. (eds) , MOTOR 2021. LNCS, vol
12755. Springer, Cham., doi: 10.1007/978-3-030-77876-7_30
8. Adeli, Ehsan et al. , Convolutional generative adversarial imputation
networks for spatio-temporal missing data in storm surge simulations.
ArXiv abs/2111.02823 (2021), https://arxiv.org/abs/2111.02823
9. E. Adeli, B. Rosic, H.G. Matthies, S. Reinstädler, D. Dinkler, Bayesian
Parameter Determination of a CT-Test described by a
Viscoplastic-Damage Model considering the Model Error, Metals 2020 10
(9), 1141, 2020
10. E. Adeli, B. Rosic, H.G. Matthies, S. Reinstädler, D. Dinkler, Comparison
of Bayesian Methods on Parameter Identification for a Viscoplastic Model
with Damage, Probabilistic Engineering Mechanics 62, 103083, 2020
11. A. Litvinenko, Y. Marzouk, H.G. Matthies, M. Scavino, A. Spantini,
Computing f-Divergences and Distances of High-Dimensional Probability
Density Functions–Low-Rank Tensor Approximations, preprint
arXiv:2111.07164, https://arxiv.org/abs/2111.07164, 2021
12. E. Adeli, B. Rosic, H.G. Matthies, Identification of a Visco-plastic Model
with Uncertain Parameters using Bayesian Methods, International Journal
of Multiphase Flow 75, 124-136, 2018 32
4
*
Acknowledgement
V. Berikov was supported by the state contract of the Sobolev
Institute of Mathematics (project no 0314-2019-0015), and by
RFBR grant 19-29-01175.
A. Litvinenko was supported by funding from the Alexander von
Humboldt Foundation.
33
4
*
Appendix
Appendix
34
4
*
Remark: stabilization with nugget
To avoid instability in computing Cholesky, we add: e
Cm = e
C + τ2I.
Let λi be eigenvalues of e
C, then eigenvalues of e
Cm will be λi + τ2,
log det(e
Cm) = log
Qn
i=1(λi + τ2) =
Pn
i=1 log(λi + τ2).
(left) Dependence of the negative log-likelihood on parameter `
with nuggets {0.01, 0.005, 0.001} for the Gaussian covariance.
(right) Zoom of the left figure near minimum; n = 2000 random
points from moisture example, rank k = 14, τ2 = 1, ν = 0.5.
35
4
*
How much memory do H-matrices need?
0 0.5 1 1.5 2 2.5
ℓ, ν = 0.325, σ2
= 0.98
5
5.5
6
6.5
size,
in
bytes
×10 6
1e-4
1e-6
0.2 0.4 0.6 0.8 1 1.2 1.4
ν, ℓ = 0.58, σ2
= 0.98
5.2
5.4
5.6
5.8
6
6.2
6.4
6.6
size,
in
bytes
×10 6
1e-4
1e-6
(left) Dependence of the matrix size on the covariance length `,
and (right) the smoothness ν for two different H-accuracies
ε = {10−4, 10−6}
36
4
*
H-matrices : Error convergence
0 10 20 30 40 50 60 70 80 90 100
−25
−20
−15
−10
−5
0
rank k
log(rel.
error)
Spectral norm, L=0.1, nu=1
Frob. norm, L=0.1
Spectral norm, L=0.2
Frob. norm, L=0.2
Spectral norm, L=0.5
Frob. norm, L=0.5
0 10 20 30 40 50 60 70 80 90 100
−16
−14
−12
−10
−8
−6
−4
−2
0
rank k
log(rel.
error)
Spectral norm, L=0.1, nu=0.5
Frob. norm, L=0.1
Spectral norm, L=0.2
Frob. norm, L=0.2
Spectral norm, L=0.5
Frob. norm, L=0.5
Convergence of the H-matrix approximation errors for covariance
lengths {0.1, 0.2, 0.5}; (left) ν = 1 and (right) ν = 0.5,
computational domain [0, 1]2.
37
Dependence of log-likelihood ingredients on parameters, n = 4225.
k = 8 in the first row and k = 16 in the second.
38
4
*
How does log-likelihood depend on n?
ν
0 0.1 0.2 0.3 0.4 0.5 0.6
−L
10 3
10 4
10 5
2000
4000
8000
16000
32000
Рис.: Dependence of negative log-likelihood function on different number
of locations n = {2000, 4000, 8000, 16000, 32000} in log-scale.
39

More Related Content

What's hot

Logarithm Bases IA
Logarithm Bases IALogarithm Bases IA
Logarithm Bases IAbank8787
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...Alexander Decker
 
Multi nomial pdf
Multi nomial pdfMulti nomial pdf
Multi nomial pdfsabbir11
 
Potencias resueltas 1eso (1)
Potencias resueltas 1eso (1)Potencias resueltas 1eso (1)
Potencias resueltas 1eso (1)Lina Manriquez
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Tensorizing Neural Network
Tensorizing Neural NetworkTensorizing Neural Network
Tensorizing Neural NetworkRuochun Tzeng
 
Estimating structured vector autoregressive models
Estimating structured vector autoregressive modelsEstimating structured vector autoregressive models
Estimating structured vector autoregressive modelsAkira Tanimoto
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
Mixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete ObservationsMixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete Observationssipij
 
Radix-3 Algorithm for Realization of Discrete Fourier Transform
Radix-3 Algorithm for Realization of Discrete Fourier TransformRadix-3 Algorithm for Realization of Discrete Fourier Transform
Radix-3 Algorithm for Realization of Discrete Fourier TransformIJERA Editor
 
Measures of different reliability parameters for a complex redundant system u...
Measures of different reliability parameters for a complex redundant system u...Measures of different reliability parameters for a complex redundant system u...
Measures of different reliability parameters for a complex redundant system u...Alexander Decker
 
Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...TELKOMNIKA JOURNAL
 
On Optimization of Manufacturing of a Sense-amplifier Based Flip-flop
On Optimization of Manufacturing of a Sense-amplifier Based Flip-flopOn Optimization of Manufacturing of a Sense-amplifier Based Flip-flop
On Optimization of Manufacturing of a Sense-amplifier Based Flip-flopBRNSS Publication Hub
 
Numerical analysis stationary variables
Numerical analysis  stationary variablesNumerical analysis  stationary variables
Numerical analysis stationary variablesSHAMJITH KM
 

What's hot (20)

Logarithm Bases IA
Logarithm Bases IALogarithm Bases IA
Logarithm Bases IA
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...
 
Lecture9 xing
Lecture9 xingLecture9 xing
Lecture9 xing
 
Multi nomial pdf
Multi nomial pdfMulti nomial pdf
Multi nomial pdf
 
Potencias resueltas 1eso (1)
Potencias resueltas 1eso (1)Potencias resueltas 1eso (1)
Potencias resueltas 1eso (1)
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Talk litvinenko prior_cov
Talk litvinenko prior_covTalk litvinenko prior_cov
Talk litvinenko prior_cov
 
Tensorizing Neural Network
Tensorizing Neural NetworkTensorizing Neural Network
Tensorizing Neural Network
 
Estimating structured vector autoregressive models
Estimating structured vector autoregressive modelsEstimating structured vector autoregressive models
Estimating structured vector autoregressive models
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
Mixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete ObservationsMixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete Observations
 
Radix-3 Algorithm for Realization of Discrete Fourier Transform
Radix-3 Algorithm for Realization of Discrete Fourier TransformRadix-3 Algorithm for Realization of Discrete Fourier Transform
Radix-3 Algorithm for Realization of Discrete Fourier Transform
 
Measures of different reliability parameters for a complex redundant system u...
Measures of different reliability parameters for a complex redundant system u...Measures of different reliability parameters for a complex redundant system u...
Measures of different reliability parameters for a complex redundant system u...
 
D04302031042
D04302031042D04302031042
D04302031042
 
Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...Direct split-radix algorithm for fast computation of type-II discrete Hartley...
Direct split-radix algorithm for fast computation of type-II discrete Hartley...
 
On Optimization of Manufacturing of a Sense-amplifier Based Flip-flop
On Optimization of Manufacturing of a Sense-amplifier Based Flip-flopOn Optimization of Manufacturing of a Sense-amplifier Based Flip-flop
On Optimization of Manufacturing of a Sense-amplifier Based Flip-flop
 
Numerical analysis stationary variables
Numerical analysis  stationary variablesNumerical analysis  stationary variables
Numerical analysis stationary variables
 
Chap 4 complex numbers focus exam ace
Chap 4 complex numbers focus exam aceChap 4 complex numbers focus exam ace
Chap 4 complex numbers focus exam ace
 
Ada
AdaAda
Ada
 
Irjet v2i170
Irjet v2i170Irjet v2i170
Irjet v2i170
 

Similar to Identification of unknown parameters and prediction via hierarchical matrices

Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...Alexander Litvinenko
 
Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...Alexander Litvinenko
 
Overview of sparse and low-rank matrix / tensor techniques
Overview of sparse and low-rank matrix / tensor techniques Overview of sparse and low-rank matrix / tensor techniques
Overview of sparse and low-rank matrix / tensor techniques Alexander Litvinenko
 
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Alexander Litvinenko
 
Application of Parallel Hierarchical Matrices in Spatial Statistics and Param...
Application of Parallel Hierarchical Matrices in Spatial Statistics and Param...Application of Parallel Hierarchical Matrices in Spatial Statistics and Param...
Application of Parallel Hierarchical Matrices in Spatial Statistics and Param...Alexander Litvinenko
 
Numerical Methods: curve fitting and interpolation
Numerical Methods: curve fitting and interpolationNumerical Methods: curve fitting and interpolation
Numerical Methods: curve fitting and interpolationNikolai Priezjev
 
Litvinenko, Uncertainty Quantification - an Overview
Litvinenko, Uncertainty Quantification - an OverviewLitvinenko, Uncertainty Quantification - an Overview
Litvinenko, Uncertainty Quantification - an OverviewAlexander Litvinenko
 
Nelson maple pdf
Nelson maple pdfNelson maple pdf
Nelson maple pdfNelsonP23
 
"SSumM: Sparse Summarization of Massive Graphs", KDD 2020
"SSumM: Sparse Summarization of Massive Graphs", KDD 2020"SSumM: Sparse Summarization of Massive Graphs", KDD 2020
"SSumM: Sparse Summarization of Massive Graphs", KDD 2020KyuhanLee4
 
Complete Factoring Rules.ppt
Complete Factoring Rules.pptComplete Factoring Rules.ppt
Complete Factoring Rules.pptJasmin679773
 
Complete Factoring Rules in Grade 8 Math.ppt
Complete Factoring Rules in Grade 8 Math.pptComplete Factoring Rules in Grade 8 Math.ppt
Complete Factoring Rules in Grade 8 Math.pptElmabethDelaCruz2
 
My presentation at University of Nottingham "Fast low-rank methods for solvin...
My presentation at University of Nottingham "Fast low-rank methods for solvin...My presentation at University of Nottingham "Fast low-rank methods for solvin...
My presentation at University of Nottingham "Fast low-rank methods for solvin...Alexander Litvinenko
 
Semana 11 numeros complejos ii álgebra-uni ccesa007
Semana 11   numeros complejos ii   álgebra-uni ccesa007Semana 11   numeros complejos ii   álgebra-uni ccesa007
Semana 11 numeros complejos ii álgebra-uni ccesa007Demetrio Ccesa Rayme
 
The sexagesimal foundation of mathematics
The sexagesimal foundation of mathematicsThe sexagesimal foundation of mathematics
The sexagesimal foundation of mathematicsMichielKarskens
 
Hierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matricesHierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matricesAlexander Litvinenko
 
Image scalar hw_algorithm
Image scalar hw_algorithmImage scalar hw_algorithm
Image scalar hw_algorithmsean chen
 
08 - 25 Jan - Divide and Conquer
08 - 25 Jan - Divide and Conquer08 - 25 Jan - Divide and Conquer
08 - 25 Jan - Divide and ConquerNeeldhara Misra
 
Finite Element Solution On Effects Of Viscous Dissipation & Diffusion Thermo ...
Finite Element Solution On Effects Of Viscous Dissipation & Diffusion Thermo ...Finite Element Solution On Effects Of Viscous Dissipation & Diffusion Thermo ...
Finite Element Solution On Effects Of Viscous Dissipation & Diffusion Thermo ...IRJET Journal
 
Fast and accurate metrics. Is it actually possible?
Fast and accurate metrics. Is it actually possible?Fast and accurate metrics. Is it actually possible?
Fast and accurate metrics. Is it actually possible?Bogdan Storozhuk
 

Similar to Identification of unknown parameters and prediction via hierarchical matrices (20)

Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...
 
Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...
 
Overview of sparse and low-rank matrix / tensor techniques
Overview of sparse and low-rank matrix / tensor techniques Overview of sparse and low-rank matrix / tensor techniques
Overview of sparse and low-rank matrix / tensor techniques
 
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...
 
Application of Parallel Hierarchical Matrices in Spatial Statistics and Param...
Application of Parallel Hierarchical Matrices in Spatial Statistics and Param...Application of Parallel Hierarchical Matrices in Spatial Statistics and Param...
Application of Parallel Hierarchical Matrices in Spatial Statistics and Param...
 
Numerical Methods: curve fitting and interpolation
Numerical Methods: curve fitting and interpolationNumerical Methods: curve fitting and interpolation
Numerical Methods: curve fitting and interpolation
 
Litvinenko, Uncertainty Quantification - an Overview
Litvinenko, Uncertainty Quantification - an OverviewLitvinenko, Uncertainty Quantification - an Overview
Litvinenko, Uncertainty Quantification - an Overview
 
Nelson maple pdf
Nelson maple pdfNelson maple pdf
Nelson maple pdf
 
"SSumM: Sparse Summarization of Massive Graphs", KDD 2020
"SSumM: Sparse Summarization of Massive Graphs", KDD 2020"SSumM: Sparse Summarization of Massive Graphs", KDD 2020
"SSumM: Sparse Summarization of Massive Graphs", KDD 2020
 
Complete Factoring Rules.ppt
Complete Factoring Rules.pptComplete Factoring Rules.ppt
Complete Factoring Rules.ppt
 
Complete Factoring Rules in Grade 8 Math.ppt
Complete Factoring Rules in Grade 8 Math.pptComplete Factoring Rules in Grade 8 Math.ppt
Complete Factoring Rules in Grade 8 Math.ppt
 
Control charts
Control chartsControl charts
Control charts
 
My presentation at University of Nottingham "Fast low-rank methods for solvin...
My presentation at University of Nottingham "Fast low-rank methods for solvin...My presentation at University of Nottingham "Fast low-rank methods for solvin...
My presentation at University of Nottingham "Fast low-rank methods for solvin...
 
Semana 11 numeros complejos ii álgebra-uni ccesa007
Semana 11   numeros complejos ii   álgebra-uni ccesa007Semana 11   numeros complejos ii   álgebra-uni ccesa007
Semana 11 numeros complejos ii álgebra-uni ccesa007
 
The sexagesimal foundation of mathematics
The sexagesimal foundation of mathematicsThe sexagesimal foundation of mathematics
The sexagesimal foundation of mathematics
 
Hierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matricesHierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matrices
 
Image scalar hw_algorithm
Image scalar hw_algorithmImage scalar hw_algorithm
Image scalar hw_algorithm
 
08 - 25 Jan - Divide and Conquer
08 - 25 Jan - Divide and Conquer08 - 25 Jan - Divide and Conquer
08 - 25 Jan - Divide and Conquer
 
Finite Element Solution On Effects Of Viscous Dissipation & Diffusion Thermo ...
Finite Element Solution On Effects Of Viscous Dissipation & Diffusion Thermo ...Finite Element Solution On Effects Of Viscous Dissipation & Diffusion Thermo ...
Finite Element Solution On Effects Of Viscous Dissipation & Diffusion Thermo ...
 
Fast and accurate metrics. Is it actually possible?
Fast and accurate metrics. Is it actually possible?Fast and accurate metrics. Is it actually possible?
Fast and accurate metrics. Is it actually possible?
 

More from Alexander Litvinenko

litvinenko_Intrusion_Bari_2023.pdf
litvinenko_Intrusion_Bari_2023.pdflitvinenko_Intrusion_Bari_2023.pdf
litvinenko_Intrusion_Bari_2023.pdfAlexander Litvinenko
 
Density Driven Groundwater Flow with Uncertain Porosity and Permeability
Density Driven Groundwater Flow with Uncertain Porosity and PermeabilityDensity Driven Groundwater Flow with Uncertain Porosity and Permeability
Density Driven Groundwater Flow with Uncertain Porosity and PermeabilityAlexander Litvinenko
 
Uncertain_Henry_problem-poster.pdf
Uncertain_Henry_problem-poster.pdfUncertain_Henry_problem-poster.pdf
Uncertain_Henry_problem-poster.pdfAlexander Litvinenko
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfAlexander Litvinenko
 
Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...Alexander Litvinenko
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Alexander Litvinenko
 
Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Alexander Litvinenko
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Alexander Litvinenko
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Alexander Litvinenko
 
Propagation of Uncertainties in Density Driven Groundwater Flow
Propagation of Uncertainties in Density Driven Groundwater FlowPropagation of Uncertainties in Density Driven Groundwater Flow
Propagation of Uncertainties in Density Driven Groundwater FlowAlexander Litvinenko
 
Simulation of propagation of uncertainties in density-driven groundwater flow
Simulation of propagation of uncertainties in density-driven groundwater flowSimulation of propagation of uncertainties in density-driven groundwater flow
Simulation of propagation of uncertainties in density-driven groundwater flowAlexander Litvinenko
 
Approximation of large covariance matrices in statistics
Approximation of large covariance matrices in statisticsApproximation of large covariance matrices in statistics
Approximation of large covariance matrices in statisticsAlexander Litvinenko
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleAlexander Litvinenko
 
Talk Alexander Litvinenko on SIAM GS Conference in Houston
Talk Alexander Litvinenko on SIAM GS Conference in HoustonTalk Alexander Litvinenko on SIAM GS Conference in Houston
Talk Alexander Litvinenko on SIAM GS Conference in HoustonAlexander Litvinenko
 
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Alexander Litvinenko
 

More from Alexander Litvinenko (20)

litvinenko_Intrusion_Bari_2023.pdf
litvinenko_Intrusion_Bari_2023.pdflitvinenko_Intrusion_Bari_2023.pdf
litvinenko_Intrusion_Bari_2023.pdf
 
Density Driven Groundwater Flow with Uncertain Porosity and Permeability
Density Driven Groundwater Flow with Uncertain Porosity and PermeabilityDensity Driven Groundwater Flow with Uncertain Porosity and Permeability
Density Driven Groundwater Flow with Uncertain Porosity and Permeability
 
litvinenko_Gamm2023.pdf
litvinenko_Gamm2023.pdflitvinenko_Gamm2023.pdf
litvinenko_Gamm2023.pdf
 
Litvinenko_Poster_Henry_22May.pdf
Litvinenko_Poster_Henry_22May.pdfLitvinenko_Poster_Henry_22May.pdf
Litvinenko_Poster_Henry_22May.pdf
 
Uncertain_Henry_problem-poster.pdf
Uncertain_Henry_problem-poster.pdfUncertain_Henry_problem-poster.pdf
Uncertain_Henry_problem-poster.pdf
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
 
Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...
 
Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...
 
Propagation of Uncertainties in Density Driven Groundwater Flow
Propagation of Uncertainties in Density Driven Groundwater FlowPropagation of Uncertainties in Density Driven Groundwater Flow
Propagation of Uncertainties in Density Driven Groundwater Flow
 
Simulation of propagation of uncertainties in density-driven groundwater flow
Simulation of propagation of uncertainties in density-driven groundwater flowSimulation of propagation of uncertainties in density-driven groundwater flow
Simulation of propagation of uncertainties in density-driven groundwater flow
 
Approximation of large covariance matrices in statistics
Approximation of large covariance matrices in statisticsApproximation of large covariance matrices in statistics
Approximation of large covariance matrices in statistics
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
Talk Alexander Litvinenko on SIAM GS Conference in Houston
Talk Alexander Litvinenko on SIAM GS Conference in HoustonTalk Alexander Litvinenko on SIAM GS Conference in Houston
Talk Alexander Litvinenko on SIAM GS Conference in Houston
 
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
Efficient Simulations for Contamination of Groundwater Aquifers under Uncerta...
 

Recently uploaded

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Recently uploaded (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Identification of unknown parameters and prediction via hierarchical matrices

  • 1. Identification of unknown parameters and prediction of missing values. Comparison of approaches Alexander Litvinenko1, (joint work with V. Berikov2,3, R. Kriemann4 and KAUST people) Eurasian Conference on Applied Mathematics-2021 Novosibirsk, Akademgorodok, December 16-21, 2021 1RWTH Aachen, Germany, 2Novosibirsk State University, 3Sobolev Institute of Mathematics, 4MPI for Mathematics in the Sciences in Leipzig, Germany
  • 2. 4 * Analysis of large datasets and prediction Goals: identify unknown parameters in a large dataset and prediction MLE deals with dense cov. matrix C of size 2, 000, 000 × 2, 000, 000 H-matrix rank 3 7 9 12 nu 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 3 2 4 2 9 2 4 2 17 2 2 4 2 7 3 2 18 17 10 10 2 5 2 8 2 4 2 10 3 3 2 32 14 10 18 10 36 28 3 6 2 4 2 14 2 4 2 5 2 4 2 14 18 14 18 2 4 2 8 2 5 3 19 2 4 2 6 3 6 3 65 32 19 21 18 18 29 29 61 61 2 5 3 7 3 2 5 2 18 3 6 3 6 3 17 17 12 12 3 2 7 2 4 2 2 12 3 2 4 2 32 17 12 12 12 29 29 2 5 3 2 7 2 4 2 12 2 5 3 10 10 19 12 3 3 2 10 2 5 2 5 2 4 2 119 65 32 10 15 10 19 27 27 54 54 118 118 2 5 3 4 2 15 2 5 3 11 3 6 3 15 17 10 10 2 4 2 8 2 4 2 10 3 2 3 21 14 10 7 7 33 27 2 4 3 3 2 7 3 8 7 14 7 2 3 8 2 4 2 7 3 2 10 2 5 2 59 21 8 18 8 12 21 29 59 54 3 6 3 2 5 2 12 2 5 3 2 18 12 11 11 3 2 7 3 5 2 11 2 5 3 25 9 9 16 11 30 29 3 3 9 3 4 2 6 3 9 16 9 16 3 2 4 2 7 3 2 16 3 2 7 2 4 2 121 119 59 25 18 16 10 10 25 34 59 59 119 118 80 119 2 4 2 4 2 9 2 4 2 10 3 4 2 19 10 15 10 2 5 3 2 7 3 2 15 3 4 2 5 2 31 15 15 16 15 32 34 3 2 4 2 4 2 15 2 4 3 2 5 2 15 16 15 16 2 5 3 2 12 3 6 3 20 3 2 7 3 2 6 3 58 27 18 18 9 9 31 31 66 67 2 5 3 2 6 3 9 2 4 2 12 9 18 9 3 6 3 12 3 4 2 9 2 4 2 27 12 16 12 19 27 31 3 2 3 6 3 16 2 5 2 4 2 6 3 16 20 8 8 2 4 2 7 3 2 7 3 2 8 2 3 115 58 30 15 8 15 8 28 28 57 57 119 119 2 5 2 4 2 2 15 2 4 2 6 3 9 9 15 15 3 3 9 2 5 3 2 7 3 2 27 9 16 9 11 30 28 3 5 2 5 2 11 2 5 2 16 11 13 11 3 3 2 7 3 2 13 2 5 2 4 2 48 25 5 5 17 13 23 23 58 57 2 5 2 4 2 5 2 6 3 5 8 5 15 2 3 8 3 2 2 6 3 25 8 15 8 9 25 23 2 5 3 6 3 7 3 2 9 2 4 2 15 9 23 9 3 3 6 3 15 2 4 2 5 2 10 2 5 2 90 122 115 48 32 15 15 15 17 33 33 48 60 115 119 81 234 56 89 3 2 7 2 4 2 15 3 4 2 7 3 2 15 17 6 6 2 5 2 5 3 11 2 5 3 6 3 30 11 6 19 6 30 30 3 5 2 11 3 2 7 3 2 5 2 11 13 11 17 2 4 2 5 2 13 3 5 2 6 3 65 30 13 17 13 17 30 30 61 60 2 4 3 6 3 5 2 21 2 4 2 4 2 12 3 6 3 20 20 14 14 2 4 2 8 2 4 2 3 14 2 5 2 4 2 32 17 14 15 14 29 29 2 4 2 8 2 4 2 15 2 5 2 5 2 10 10 17 15 2 5 2 10 3 6 2 4 2 5 2 125 75 32 10 18 10 19 32 29 69 61 125 134 2 5 3 2 6 3 18 3 2 6 3 8 2 4 2 17 17 18 19 2 4 2 8 2 4 2 17 3 6 3 2 6 3 39 17 19 17 18 25 25 3 2 7 2 4 2 11 2 4 3 2 18 3 6 3 6 3 10 10 15 15 3 2 3 10 2 4 3 5 2 65 30 10 15 10 12 35 25 69 69 2 4 2 9 2 4 2 12 2 2 5 2 18 12 14 12 2 5 2 10 2 4 3 2 14 3 6 2 4 2 30 15 14 16 14 30 35 2 4 2 6 3 15 3 2 4 2 5 2 15 15 15 16 2 4 2 5 2 2 15 3 6 3 8 2 4 2 3 227 111 61 31 15 21 15 16 24 24 50 50 116 116 81 224 3 6 3 9 2 4 2 16 2 5 2 6 2 19 16 5 5 2 5 3 8 2 4 2 5 2 23 16 5 7 5 27 24 3 6 2 5 2 7 2 3 10 7 16 7 3 4 2 10 3 6 3 5 2 54 23 10 17 10 11 23 25 61 50 2 2 7 2 5 3 11 2 5 2 13 11 12 11 3 2 5 2 12 2 4 2 4 2 29 13 12 13 12 27 25 2 5 2 5 2 15 2 5 3 2 7 2 4 2 15 18 9 9 2 5 2 5 2 3 9 2 4 2 111 54 33 18 9 12 9 22 22 54 59 110 110 3 2 7 2 4 2 6 3 12 2 5 3 2 9 9 13 12 2 4 2 9 3 2 6 3 33 9 13 9 13 22 22 3 2 6 3 9 2 4 2 15 2 4 2 6 3 12 12 10 10 2 2 6 3 10 2 4 2 49 23 7 7 12 10 26 22 55 59 3 7 3 2 4 2 4 2 7 8 7 16 3 2 8 3 6 3 6 3 23 8 15 8 11 23 26 3 2 7 3 2 11 2 4 2 3 12 11 15 11 3 2 5 2 12 2 4 2 6 3 8 2 4 2 We show how to: 1. reduce storage cost from 32TB to 16 GB 2. approximate Cholesky factorisation of C, its determinant, inverse in 8 minutes on modern desktop computer. 3. make prediction 2
  • 3. 4 * The structure of the talk 1. Motivation: improve statistical models, data analysis, prediction 2. Identification of unknown parameters via maximizing Gaussian log-likelihood (MLE) 3. Tools: Hierarchical matrices [Hackbusch 1999] 4. Matérn covariance function, joint Gaussian log-likelihood 5. Error analysis 6. Prediction at new locations 7. Comparison with machine learning methods 3
  • 4. 4 * Identifying unknown parameters via MLE method Given: Let s1, . . . , sn be locations. Z = {Z(s1), . . . , Z(sn)}>, where Z(s) is a Gaussian random field indexed by a spatial location s ∈ Rd , d ≥ 1. Assumption: Z has mean zero and stationary parametric covariance function, e.g. Matérn: C(θ) = 2σ2 Γ(ν) r 2` ν Kν r ` + τ2 I, θ = (σ2 , ν, `, τ2 ). To identify: unknown parameters θ := (σ2, ν, `, τ2). 4
  • 5. 4 * Identifying unknown parameters via MLE method Statistical inference about θ is then based on the Gaussian log-likelihood function: L(C(θ)) = L(θ) = − n 2 log(2π) − 1 2 log|C(θ)| − 1 2 Z C(θ)−1 Z, (1) where the covariance matrix C(θ) has entries C(si − sj ; θ), i, j = 1, . . . , n. The maximum likelihood estimator of θ is the value b θ that maximizes (1). 5
  • 6. 4 * H-matrix approximations of C, L and C−1 43 13 28 14 34 10 5 9 15 44 13 29 13 35 14 7 5 14 50 14 62 15 6 6 15 60 15 31 17 44 13 14 6 7 14 3 13 6 5 16 12 8 7 15 3 15 34 11 31 16 59 13 6 7 12 58 15 51 16 7 6 16 31 15 39 12 36 14 40 15 6 6 17 32 14 40 10 7 7 10 41 16 35 5 13 8 6 9 10 7 7 9 3 10 1 5 6 6 6 8 8 7 8 8 3 9 7 7 9 8 6 7 7 3 9 1 6 47 12 51 15 7 7 15 38 17 34 17 34 15 32 13 7 7 10 9 7 6 9 3 10 56 14 46 14 7 8 12 64 15 57 9 6 6 8 16 8 7 14 3 9 5 5 9 9 7 15 16 6 5 14 2 9 6 6 8 54 16 30 10 5 7 8 32 15 33 14 5 8 16 34 14 35 9 6 7 9 34 15 31 17 6 7 15 60 15 34 14 28 15 6 7 13 29 13 35 16 54 2 1 1 2 14 12 6 9 12 2 12 6 9 8 6 6 8 16 9 6 13 3 16 1 1 1 2 1 1 2 1 6 1 1 1 6 1 1 2 1 1 3 7 6 6 9 15 6 7 14 2 9 6 6 8 8 5 8 5 7 9 14 6 8 15 2 14 1 1 1 1 3 1 1 2 56 16 31 12 32 12 6 10 14 55 16 35 14 43 15 8 7 9 5 8 4 9 3 10 41 16 28 15 30 16 38 14 7 6 17 36 11 38 8 8 6 9 40 17 36 13 13 7 7 15 3 14 8 6 12 16 6 6 13 2 16 60 17 29 14 35 11 6 6 17 54 15 67 14 6 6 16 36 14 39 18 59 14 6 7 17 52 14 58 6 9 9 8 6 8 3 10 7 8 17 1 5 5 7 6 15 7 9 13 1 5 61 13 59 13 6 6 13 37 16 38 15 58 13 5 4 16 31 13 33 18 59 12 6 7 12 74 15 63 15 15 7 7 17 3 9 6 6 9 7 7 9 6 6 8 15 7 7 14 1 14 51 15 42 13 37 13 7 6 14 59 16 54 14 8 6 16 36 14 37 19 57 14 7 6 10 52 13 65 43 13 28 13 34 10 5 9 15 44 13 29 13 35 14 7 5 14 50 14 62 14 6 6 14 60 15 31 16 44 13 14 8 7 14 3 13 5 5 17 11 7 7 15 2 15 34 11 31 16 59 13 6 7 12 58 15 51 16 7 6 16 31 15 39 12 36 14 40 15 5 6 17 32 14 40 10 7 7 10 41 17 35 5 13 8 6 9 10 7 7 9 3 10 1 5 5 6 5 9 8 7 8 8 3 9 7 7 10 8 5 7 7 4 9 1 5 47 12 51 15 7 7 15 38 16 34 16 34 15 32 13 8 7 12 8 6 6 9 3 10 56 14 46 14 8 8 12 64 15 57 11 6 6 8 15 9 7 14 3 10 5 5 9 7 7 16 16 7 5 15 2 9 6 6 9 54 15 30 10 5 7 8 32 14 33 13 5 8 16 34 14 35 9 6 7 9 34 15 31 17 5 7 15 60 15 34 13 28 14 5 7 13 29 13 35 15 54 1 1 1 2 14 12 6 9 11 2 12 5 9 9 5 6 8 16 10 6 13 3 16 1 2 1 1 2 1 6 1 1 6 1 1 2 1 1 2 8 5 6 9 15 8 7 14 3 9 6 6 8 8 5 9 5 7 9 14 7 8 15 2 14 1 1 1 2 1 1 1 56 15 31 12 32 12 6 10 13 55 16 35 14 43 15 8 7 10 5 9 4 9 3 10 41 16 28 9 4 8 7 30 16 38 14 7 6 17 36 11 38 8 8 6 9 40 17 36 13 13 6 7 15 3 14 8 6 13 16 6 6 13 1 16 60 16 29 14 35 11 8 6 17 54 15 67 13 5 6 16 36 14 39 17 59 14 6 7 17 52 14 58 6 10 9 8 6 8 3 10 7 8 16 1 4 5 7 5 15 6 9 13 1 5 61 13 59 12 6 6 13 37 16 38 15 58 13 5 4 16 31 12 33 18 59 12 5 7 12 74 14 63 14 15 8 7 18 2 10 6 6 8 5 7 11 6 6 8 15 6 7 14 2 14 51 15 42 13 37 12 8 6 14 59 15 54 13 8 6 15 36 13 37 19 57 15 6 6 10 52 13 65 H-matrix approximations of the exponential covariance matrix (left), its hierarchical Cholesky factor L̃ (middle), and the zoomed upper-left corner of the matrix (right), n = 4000, ` = 0.09, ν = 0.5, σ2 = 1. 6
  • 7. 4 * What will change after H-matrix approximation? Approximate C by CH 1. How the eigenvalues of C and CH differ ? 2. How det(C) differs from det(CH) ? 3. How L differs from LH ? [Mario Bebendorf et al] 4. How C−1 differs from (CH)−1 ? [Mario Bebendorf et al] 5. How L̃(θ, k) differs from L(θ)? 6. What is optimal H-matrix rank? 7. How θH differs from θ? For theory, estimates for the rank and accuracy see works of Bebendorf, Grasedyck, Le Borne, Hackbusch,... 7
  • 8. 4 * Details of the parameter identification To maximize the log-likelihood function we use the Brent’s method (combining bisection method, secant method and inverse quadratic interpolation) or any other. 1. C(θ) ≈ e C(θ, ε). 2. e C(θ, k) = e L(θ, k)e L(θ, k)T 3. ZT e C−1Z = ZT (e Le LT )−1Z = vT · v, where v is a solution of e L(θ, k)v(θ) := Z. log det{e C} = log det{e Le LT } = log det{ n Y i=1 λ2 i } = 2 n X i=1 logλi , e L(θ, k) = − n 2 log(2π) − n X i=1 log{e Lii (θ, k)} − 1 2 (v(θ)T · v(θ)). (2) 8
  • 9. 4 * Error analysis Theorem (1) Let e C be an H-matrix approximation of matrix C ∈ Rn×n such that ρ(e C−1 C − I) ≤ ε 1. Then |log det C − log det e C| ≤ −nlog(1 − ε), (3) Proof: See [Ballani, Kressner 14] and [Ipsen 05]. Remark: factor n is pessimistic and is not really observed numerically. 9
  • 10. 4 * Error in the log-likelihood Theorem (2) Let e C ≈ C ∈ Rn×n and Z be a vector, kZk ≤ c0 and kC−1k ≤ c1. Let ρ(e C−1C − I) ≤ ε 1. Then it holds | e L(θ) − L(θ)| = 1 2 (log|C| − log|e C|) + 1 2 |ZT C−1 − e C−1 Z| ≤ − 1 2 · nlog(1 − ε) + 1 2 |ZT C−1 C − e C−1 C C−1 Z| ≤ − 1 2 · nlog(1 − ε) + 1 2 c2 0 · c1 · ε.
  • 11. 4 * H-matrix approximation ε accuracy in each sub-block, n = 16641, ν = 0.5, c.r.=compression ratio. ε |log|C| − log|e C|| | log|C|−log|e C| log|e C| | kC − e CkF kC−e Ck2 kCk2 kI − (e Le L )−1 Ck2 c.r. in % ` = 0.0334 1e-1 3.2e-4 1.2e-4 7.0e-3 7.6e-3 2.9 9.16 1e-2 1.6e-6 6.0e-7 1.0e-3 6.7e-4 9.9e-2 9.4 1e-4 1.8e-9 7.0e-10 1.0e-5 7.3e-6 2.0e-3 10.2 1e-8 4.7e-13 1.8e-13 1.3e-9 6e-10 2.1e-7 12.7 ` = 0.2337 1e-4 9.8e-5 1.5e-5 8.1e-5 1.4e-5 2.5e-1 9.5 1e-8 1.45e-9 2.3e-10 1.1e-8 1.5e-9 4e-5 11.3 log|C| = 2.63 for ` = 0.0334 and log|C| = 6.36 for ` = 0.2337. 11
  • 12. 4 * Boxplots for unknown parameters Moisture data example. Boxplots for the 100 estimates of (`, ν, σ2), respectively, when n = 32K, 16K, 8K, 4K, 2K. H-matrix with a fixed rank k = 11. Green horizontal lines denote 25% and 75% quantiles for n = 32K. 12
  • 13. 4 * Time and memory for parallel H-matrix approximation Maximal # cores is 40, ν = 0.325, ` = 0.64, σ2 = 0.98 n C̃ L̃L̃ time size kB/dof time size kI − (L̃L̃ )−1 C̃k2 sec. MB sec. MB 32.000 3.3 162 5.1 2.4 172.7 2.4 · 10−3 128.000 13.3 776 6.1 13.9 881.2 1.1 · 10−2 512.000 52.8 3420 6.7 77.6 4150 3.5 · 10−2 2.000.000 229 14790 7.4 473 18970 1.4 · 10−1 Dell Station, 20 × 2 cores, 128 GB RAM, bought in 2013 for 10.000 USD. 13
  • 14. 4 * Prediction Let Z = (Z1, Z2) has mean zero and a stationary covariance, Z1 - known, Z2 unknown. C = C11 C12 C21 C22, We compute predicted values Z2 = C21C−1 11 Z1 Z2 has the conditional distribution with the mean value C21C−1 11 Z1 and the covariance matrix C22 − C21C−1 11 C12. 14
  • 15. 4 * 2021 KAUST Competition We participated in 2021 KAUST Competition on Spatial Statistics for Large Datasets. You can download the datasets and look the final results here https://cemse.kaust.edu.sa/stsds/ 2021-kaust-competition-spatial-statistics-large-datasets See more our details in https://www.eccomasproceedia.org/ conferences/thematic-conferences/uncecomp-2021/8027 15
  • 16. 4 * Machine learning methods to make prediction k-nearest neighbours (kNN): for each x find its k nearest neighbors x1, . . . , xk with respect to some metrics, and set the prediction ŷ = 1 k Pk i=1 yi , where yi is the observed responses xi . k is determined by cross-validation procedure. We tried k ∈ {1, . . . , 20}. Random Forest (RF): a large number of decision (or regression) trees are generated independently on random sub-samples of data. The final decision for x is calculated over the ensemble of trees by averaging the predicted outcomes. We tried ∈ {100, . . . , 150} tries. 16
  • 17. 4 * Machine learning methods to make prediction Deep Neural Network (DNN): fully connected neural network Input layer consists of two neurons (input feature dimensionality), and output layer consists of one neuron (predicted feature dimensionality). We examined few variants of FCNN architecture for {3, 4, . . . , 10} hidden layers and 50-100 neurons in each hidden layer. 17
  • 18. 4 * Datasets: For all tests below we used datasets from the statistical competition https://cemse.kaust.edu.sa/stsds/ 2021-kaust-competition-spatial-statistics-large-datasets The spatial domain is D := [0.0, 1.0] × [0.0, 1.0]. Each dataset includes 90% of training samples and 10% testing samples, where prediction should be made.
  • 19. 4 * Comparison of kNN, decision forest and FCNN kNN method showed the best MC cross-validation results (over 100 runs) in comparison with other ML methods. We found out that k = 3 neighbours is optimal for Test 2a (moderate sample size 100K), and k = 7 for large Test 2b (1Mi samples). Trying different numbers of decision trees in the random forest, we defined that an ensemble of 120 regression trees is optimal. FCNN architecture: 7 hidden layers with 100 neurons in each layer. #training epochs = 500, batch size = 10000. We use tanh(·) activation function and Adam optimizer. The average calculation time is 0.07 sec. for kNN, 12 sec. for RF 173 sec. for FCNN. Because of its efficiency, we decided to run only kNN method for larger datasets. 19
  • 20. 4 * Prediction by kNN Test-2b, dataset1: The yellow points at 900.000 locations were used for training and the blue points were predicted at 100.000 new locations. 20
  • 21. 4 * Less accurate prediction by H-MLE method H-MLE: Test 2b, dataset 1. The yellow points at 900.000 locations were used for training and the blue points were predicted at 100.000 new locations. 21
  • 22. 4 * Prediction by kNN Test-2b, dataset2: The yellow points at 900.000 locations were used for training and the blue points were predicted at 100.000 new locations. 22
  • 23. 4 * Prediction by the H-MLE method H-MLE: Test 2b, dataset 2. The yellow points at 900.000 locations were used for training and the blue points were predicted at 100.000 new locations. 23
  • 24. 4 * Prediction by the H-MLE method The yellow points at 90, 000 locations were used for training and 10, 000 values at new locations (blue points) were predicted. H-MLE: Test 2a: datasets 1 24
  • 25. 4 * Prediction by the H-MLE method The yellow points at 90, 000 locations were used for training and 10, 000 values at new locations (blue points) were predicted. H-MLE: Test 2a: datasets 2 25
  • 26. 4 * Prediction by the H-MLE method H-MLE: Test 1b, dataset 4. The yellow points at 90, 000 locations were used for training and the blue points were predicted in 10, 000 locations. 26
  • 27. 4 * Prediction by the H-MLE method H-MLE: Test 1b, datasets 7. The yellow points at 90, 000 locations were used for training and the blue points were predicted in 10, 000 locations. 27
  • 28. 4 * RMSE error: comparison for all methods Root Mean Square Error (RMSE) RMSE = v u u t 1 nt nt X i=1 Ẑ(si ) − Z(si ) 2 , (4) where Ẑ(si ) and Z(si ) are respectively the predicted and true realization values at the location si in the testing dataset, and nt is the total number of locations in the testing dataset. Test 2a Test 2b dataset 1 2 1 2 H-MLE 0.057 0.14 0.25 0.021 kNN 0.129 0.357 0.007 0.04 RF 0.226 0.607 — — FCNN 0.243 0.74 — — RMSE errors for H-MLE, kNN, RF, and FCNN methods. 28
  • 29. 4 * Conclusion I With H-matrices you can approximate Matérn covariance matrices, Gaussian log-likelihoods, identify unknown parameters and make predictions I MLE estimate and predictions depend on H-matrix accuracy I parameter identification problem has multiple solutions I Investigated dependence of H-matrix approximation error on the estimated parameters I Each of ML methods needs fine-tuning stage to optimize its hyperparameters or architecture. I kNN is easy to implement and it shows promising results
  • 30. 4 * Open questions and TODOs I The Gaussian log-likelihood function has some drawbacks for very large matrices I How to skip/avoid redundant data? I A good starting point for optimization is needed I a “preconditioner” (a simple cov. matrix) is needed I H-matrices become expensive for large number of parameters to be identified (All tests are reproducible https://github.com/litvinen/large_random_fields.git 30
  • 31. 4 * Literature 1. A. Litvinenko, R. Kriemann, V. Berikov, Identification of unknown parameters and prediction with hierarchical matrices, UNCECOMP 2021 Conf. Proc., Uncertainty Quantification in Computational Sciences and Engineering, M. Papadrakakis, V. Papadopoulos, G. Stefanou (Eds.), pp 129-144, https://2021.uncecomp.org, ISBN: 978-618-85072-6-5, 2021 2. A. Litvinenko, R. Kriemann, M.G. Genton, Y. Sun, D.E. Keyes, HLIBCov: Parallel hierarchical matrix approximation of large covariance matrices and likelihoods with applications in parameter identification, MethodsX 7, 100600, 2020 3. A. Litvinenko, Y. Sun, M.G. Genton, D.E. Keyes, Likelihood approximation with hierarchical matrices for large spatial datasets, Computational Statistics Data Analysis 137, 115-132, 2019 4. M. Espig, W. Hackbusch, A. Litvinenko, H.G. Matthies, E. Zander, Iterative algorithms for the post-processing of high-dimensional data, Journal of Computational Physics 410, 109396, 2020 5. A. Litvinenko, D. Keyes, V. Khoromskaia, B.N. Khoromskij, H.G. Matthies, Tucker tensor analysis of Matérn functions in spatial statistics, Computational Methods in Applied Mathematics 19 (1), 101-122, 2019 6. M. Espig, W. Hackbusch, A. Litvinenko, H.G. Matthies, E. Zander, Efficient analysis of high dimensional data in tensor formats, Sparse Grids and Applications, 31-56, Springer, Berlin, 2013 31
  • 32. 4 * Literature 7. Berikov V., Litvinenko A. (2021) Weakly Supervised Regression Using Manifold Regularization and Low-Rank Matrix Representation. In: Pardalos P., Khachay M., Kazakov A. (eds) , MOTOR 2021. LNCS, vol 12755. Springer, Cham., doi: 10.1007/978-3-030-77876-7_30 8. Adeli, Ehsan et al. , Convolutional generative adversarial imputation networks for spatio-temporal missing data in storm surge simulations. ArXiv abs/2111.02823 (2021), https://arxiv.org/abs/2111.02823 9. E. Adeli, B. Rosic, H.G. Matthies, S. Reinstädler, D. Dinkler, Bayesian Parameter Determination of a CT-Test described by a Viscoplastic-Damage Model considering the Model Error, Metals 2020 10 (9), 1141, 2020 10. E. Adeli, B. Rosic, H.G. Matthies, S. Reinstädler, D. Dinkler, Comparison of Bayesian Methods on Parameter Identification for a Viscoplastic Model with Damage, Probabilistic Engineering Mechanics 62, 103083, 2020 11. A. Litvinenko, Y. Marzouk, H.G. Matthies, M. Scavino, A. Spantini, Computing f-Divergences and Distances of High-Dimensional Probability Density Functions–Low-Rank Tensor Approximations, preprint arXiv:2111.07164, https://arxiv.org/abs/2111.07164, 2021 12. E. Adeli, B. Rosic, H.G. Matthies, Identification of a Visco-plastic Model with Uncertain Parameters using Bayesian Methods, International Journal of Multiphase Flow 75, 124-136, 2018 32
  • 33. 4 * Acknowledgement V. Berikov was supported by the state contract of the Sobolev Institute of Mathematics (project no 0314-2019-0015), and by RFBR grant 19-29-01175. A. Litvinenko was supported by funding from the Alexander von Humboldt Foundation. 33
  • 35. 4 * Remark: stabilization with nugget To avoid instability in computing Cholesky, we add: e Cm = e C + τ2I. Let λi be eigenvalues of e C, then eigenvalues of e Cm will be λi + τ2, log det(e Cm) = log Qn i=1(λi + τ2) = Pn i=1 log(λi + τ2). (left) Dependence of the negative log-likelihood on parameter ` with nuggets {0.01, 0.005, 0.001} for the Gaussian covariance. (right) Zoom of the left figure near minimum; n = 2000 random points from moisture example, rank k = 14, τ2 = 1, ν = 0.5. 35
  • 36. 4 * How much memory do H-matrices need? 0 0.5 1 1.5 2 2.5 ℓ, ν = 0.325, σ2 = 0.98 5 5.5 6 6.5 size, in bytes ×10 6 1e-4 1e-6 0.2 0.4 0.6 0.8 1 1.2 1.4 ν, ℓ = 0.58, σ2 = 0.98 5.2 5.4 5.6 5.8 6 6.2 6.4 6.6 size, in bytes ×10 6 1e-4 1e-6 (left) Dependence of the matrix size on the covariance length `, and (right) the smoothness ν for two different H-accuracies ε = {10−4, 10−6} 36
  • 37. 4 * H-matrices : Error convergence 0 10 20 30 40 50 60 70 80 90 100 −25 −20 −15 −10 −5 0 rank k log(rel. error) Spectral norm, L=0.1, nu=1 Frob. norm, L=0.1 Spectral norm, L=0.2 Frob. norm, L=0.2 Spectral norm, L=0.5 Frob. norm, L=0.5 0 10 20 30 40 50 60 70 80 90 100 −16 −14 −12 −10 −8 −6 −4 −2 0 rank k log(rel. error) Spectral norm, L=0.1, nu=0.5 Frob. norm, L=0.1 Spectral norm, L=0.2 Frob. norm, L=0.2 Spectral norm, L=0.5 Frob. norm, L=0.5 Convergence of the H-matrix approximation errors for covariance lengths {0.1, 0.2, 0.5}; (left) ν = 1 and (right) ν = 0.5, computational domain [0, 1]2. 37
  • 38. Dependence of log-likelihood ingredients on parameters, n = 4225. k = 8 in the first row and k = 16 in the second. 38
  • 39. 4 * How does log-likelihood depend on n? ν 0 0.1 0.2 0.3 0.4 0.5 0.6 −L 10 3 10 4 10 5 2000 4000 8000 16000 32000 Рис.: Dependence of negative log-likelihood function on different number of locations n = {2000, 4000, 8000, 16000, 32000} in log-scale. 39