10.1109@TNNLS.2020.3015200.pdf

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1
Stability Analysis of the Modified
Levenberg–Marquardt Algorithm
for the Artificial Neural
Network Training
José de Jesús Rubio , Member, IEEE
Abstract—The Levenberg–Marquardt and Newton are two
algorithms that use the Hessian for the artificial neural network
learning. In this article, we propose a modified Levenberg–
Marquardt algorithm for the artificial neural network learning
containing the training and testing stages. The modified
Levenberg–Marquardt algorithm is based on the Levenberg–
Marquardt and Newton algorithms but with the following two
differences to assure the error stability and weights boundedness:
1) there is a singularity point in the learning rates of the
Levenberg–Marquardt and Newton algorithms, while there is not
a singularity point in the learning rate of the modified Levenberg–
Marquardt algorithm and 2) the Levenberg–Marquardt and
Newton algorithms have three different learning rates, while the
modified Levenberg–Marquardt algorithm only has one learning
rate. The error stability and weights boundedness of the modi-
fied Levenberg–Marquardt algorithm are assured based on the
Lyapunov technique. We compare the artificial neural network
learning with the modified Levenberg–Marquardt, Levenberg–
Marquardt, Newton, and stable gradient algorithms for the
learning of the electric and brain signals data set.
Index Terms—Error stability, Levenberg–Marquardt, Newton,
weights boundedness.
I. INTRODUCTION
THE second-order partial derivatives of the cost function
with respect to the weights are known as the Hessian.
The Hessian of a convex function is a positive semidefinite.
If the Hessian is positive definite at a point, then the convex
function attains a minimum at that point. This property of the
Hessian makes it an attractive alternative for artificial neural
network learning. The Levenberg–Marquardt and Newton are
two algorithms that use the Hessian for the artificial neural
network learning containing the training and testing stages.
There are some interesting applications of the Levenberg–
Marquardt and Newton algorithms. In [1] and [2], the Newton
algorithm is used for learning. In [3] and [4], the New-
ton algorithm is utilized for the control. In [5]–[7], the
Manuscript received November 29, 2019; revised April 7, 2020; accepted
August 5, 2020.
The author is with the Sección de Estudios de Posgrado e Investigación,
Esime Azcapotzalco, Instituto Politécnico Nacional, Ciudad de México 02250,
Mexico (e-mail: rubio.josedejesus@gmail.com).
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2020.3015200
Newton algorithm is considered for the optimization.
In [8]–[11], the Levenberg–Marquardt algorithm is used for the
learning. In [12]–[15], the Levenberg–Marquardt algorithm is
considered for the prediction. In [16]–[18], the Levenberg–
Marquardt algorithm is utilized for the classification.
In [19]–[21], the Levenberg–Marquardt algorithm is used for
the optimization. In [22] and [23], the Levenberg–Marquardt
algorithm is considered for the control. In [24], the Levenberg–
Marquardt algorithm is considered for the detection. Since the
Levenberg–Marquardt and Newton algorithms have been con-
sidered in several applications, they could be good alternatives
for the artificial neural network learning.
If the Hessian is positive definite at a point, then a con-
vex function attains a minimum at that point, but the point
must be a singular point [25]–[27]. In this article, we study
this problem presented in Levenberg–Marquardt and Newton
algorithms that use the Hessian for the artificial neural net-
work learning by the following steps: 1) we represent the
Levenberg–Marquardt and Newton algorithms in the scalar
form and 2) we show that the Levenberg–Marquardt and
Newton algorithms in the scalar form contain the main terms
denoted as the learning rates. In the Levenberg–Marquardt and
Newton algorithms, a value of zero in their determinants is
a singularity point in their learning rates. It results that the
Levenberg–Marquardt or Newton algorithms errors are not
assured to be stable. It should be interesting to find a way to
modify one of the Levenberg–Marquardt or Newton algorithms
to make its error stable.
In this article, we propose the modified Levenberg–
Marquardt algorithm for the artificial neural network learning.
The modified Levenberg–Marquardt algorithm is based on the
Levenberg–Marquardt and Newton algorithms but with the
following two differences to assure the error stability and
weights boundedness: 1) there is a singularity point in the
learning rates of the Levenberg–Marquardt and Newton algo-
rithms, while there is not a singularity point in the learning rate
of the modified Levenberg–Marquardt algorithm; therefore,
the learning rate in the modified Levenberg–Marquardt algo-
rithm obtains bounded values and 2) the Levenberg–Marquardt
and Newton algorithms have three different learning rates,
while the modified Levenberg–Marquardt algorithm only has
one learning rate. It results that the error stability and weights
2162-237X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.

2 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
boundedness of the modified Levenberg–Marquardt algorithm
can be assured based on the Lyapunov technique; therefore,
the artificial neural network outputs and weights of the mod-
ified Levenberg–Marquardt algorithm remain bounded during
all the training and testing.
In [25]–[27], there is an interesting procedure to compute
the Levenberg–Marquardt and Newton algorithms for an arti-
ficial neural network with multiple hidden layers that are
useful in the deep learning. Different to the abovementioned
work, this article computes the modified Levenberg–Marquardt
algorithm for an artificial neural network with a single hidden
layer because of the following four reasons: 1) we show
that the two-hidden-layer Levenberg–Marquardt and Newton
algorithms are worse than the Levenberg–Marquardt and New-
ton algorithms because the Levenberg–Marquardt and Newton
algorithms present one singularity point, while the two-hidden-
layer Levenberg–Marquardt and Newton algorithms present
three singularity points; 2) there is a computational concern
that computing the inverse of the Levenberg–Marquardt and
Newton algorithms for an artificial neural network with mul-
tiple hidden layers would be very expensive; 3) in [28]–[30],
they show based on the Stone–Weierstrass theorem that the
targets can be arbitrarily well approximated by an artificial
neural network with a single hidden layer and a hyperbolic
tangent function; and 4) this article is mainly focused in
assuring the stability of the modified Levenberg–Marquardt
algorithm for an artificial neural network with a single hidden
layer.
Finally, we compare the artificial neural network learn-
ing with the modified Levenberg–Marquardt, the Levenberg–
Marquardt algorithm [8]–[11], the Newton algorithm [1], [2],
the stable gradient algorithm in a neural network [31], [32],
and the stable gradient algorithm in a radial basis function
neural network [33], [34] for the learning of the electric and
brain signals data set. The electric signal data set information
is obtained from electricity load and price forecasting with
MATLAB where the details are explained in [35]. The brain
signal data set information is obtained from our laboratory
where the details are explained in [36].
The remainder of this article is organized as follows.
Section II presents the Levenberg–Marquardt and Newton
algorithms for artificial neural network learning. Section III
discusses the two-hidden-layer Levenberg–Marquardt and
Newton algorithms for the two-hidden-layer artificial neural
network learning. Section IV introduces the modified
Levenberg–Marquardt for the artificial neural network learn-
ing, and the error stability and weights boundedness are
assured. Section V shows the comparison results of several
algorithms for the learning of the electric and brain signals
data set. In Section VI, conclusions and forthcoming work are
detailed.
II. LEVENBERG–MARQUARDT AND NEWTON
ALGORITHMS FOR THE ARTIFICIAL
NEURAL NETWORK LEARNING
The algorithms for the artificial neural network learning
frequently evaluate the first derivative of the cost function with
Fig. 1. Artificial neural network.
respect to the weights. Nevertheless, there are several cases
where it is interesting to evaluate the second derivatives of the
cost function with respect to the weights. The second-order
partial derivatives of the cost function with respect to the
weights are known as the Hessian.
A. Hessian for the Artificial Neural Network Learning
In this article, we use a special artificial neural network with
one hidden layer. It could be extended to a general multilayer
artificial neural network; nevertheless, this research is focused
on a compact artificial neural network. This artificial neural
network uses hyperbolic tangent functions in the hidden layer
and linear functions in the output layer. We define the artificial
neural network as
dl,k =

j
qlj,k g

i
pji,kai,k

(1)
where pji,k are the weights of the hidden layer, qlj,k are the
weights of the output layer, g(·) are the activation functions,
ai,k are the artificial neural network inputs, dl,k are the artificial
neural network outputs, i is the input layer, j is the hidden
layer, l is the output layer, and k is the iteration.
We consider the artificial neural network of Fig. 1.
We define pji,k as the weights of the hidden layer and qlj,k as
the weights of the output layer.
We define the cost function Ek as
Ek =
1
2
LT

l=1

dl,k − tl,k
2
(2)
where dl,k are the artificial neural network outputs, tl,k are
the data set targets, and LT is the total outputs number. The
second-order partial derivatives of the cost function Ek with
respect to the weights pji,k and qlj,k will be used to obtain the
Newton and Levenberg–Marquardt algorithms.
We consider the forward propagation as
z j,k =

i
pji,kai,k, cj,k = g

z j,k

xl,k =

j
qlj,kcj,k, dl,k = f

xl,k

= xl,k (3)

RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 3
where ai,k are the artificial neural network inputs and dl,k are
the artificial neural network outputs, pji,k are hidden layer
weights, and qlj,k are output layer weights.
We consider the activation functions in the hidden layer as
the hyperbolic tangent functions
g

z j,k

=
ez j,k
− e−z j,k
ez j,k + e−z j,k
= tanh

z j,k

. (4)
The first and second derivatives of the hyperbolic tangent
functions (4) are
g/

z j,k

=
4

ez j,k + e−z j,k
2
= sec h2

z j,k

g//

z j,k

= −2
ez j,k
− e−z j,k
ez j,k + e−z j,k
4

ez j,k + e−z j,k
2
= −2 tanh

z j,k

sec h2

z j,k

=−2g

z j,k

g/

z j,k

. (5)
We consider the activation functions of the output layer as
the linear functions
f

xl,k

= xl,k. (6)
The first and second derivatives of the linear functions (6) are
f /

xl,k

= 1, f //

xl,k

= 0. (7)
The first and second derivatives of the cost function (2) are
∂ Ek
∂dl,k
=

dl,k − tl,k

,
∂2
Ek
∂d2
l,k
= 1. (8)
Using the cost function (2), we obtain the backpropagation
of the output layer as
∂ Ek
∂qlj,k
=
∂ Ek
∂dl,k
∂dl,k
∂xl,k
∂xl,k
∂qlj,k
=

dl,k − tl,k
∂ f

xl,k

∂xl,k
cj,k
=

dl,k − tl,k
∂xl,k
∂xl,k
cj,k =

dl,k − tl,k

(1)g

z j,k

=

dl,k − tl,k

g

z j,k

(9)
where f (xl,k ) = xl,k of (6) and g(z j,k) = tanh(z j,k) of (4).
Using the cost function (2), we obtain the backpropagation
of the hidden layer as
∂ Ek
∂pji,k
=
∂ Ek
∂dl,k
∂dl,k
∂xl,k
∂xl,k
∂cj,k
∂cj,k
∂z j,k
∂z j,k
∂pji,k
=

dl,k − tl,k

(1)qlj,k g/

z j,k

ai,k
=

dl,k − tl,k

qlj,k g/

z j,k

ai,k (10)
where g/
(z j,k) = (∂cj,k/∂z j,k) = (∂g(z j,k)/∂z j,k) =
sec h2
(z j,k) of (5).
We define the second derivative of Ek as the Hessian Hk
[25]–[27]
Hk = ∇∇Ek =
⎡
⎢
⎢
⎢
⎣
∂2
Ek
∂p2
ji,k
∂2
Ek
∂pji,k∂qlj,k
∂2
Ek
∂pji,k∂qlj,k
∂2
Ek
∂q2
lj,k
⎤
⎥
⎥
⎥
⎦
(11)
where the Hessian is symmetrical
∂2
Ek
∂pji,k∂qjl,k
=
∂2
Ek
∂qjl,k∂pji,k
. (12)
The Hessian elements are
∂2
Ek
∂p2
ji,k
= a2
i,kqlj,k g//

z j,k

σi,k + g/

z j,k
2
qlj,k Si,k

∂2
Ek
∂pji,k∂qlj,k
= ai,k g/

z j,k

σi,k + cj,kqlj,k Si,k

∂2
Ek
∂q2
lj,k
= c2
j,k f //

xl,k

σi,k + f /

xl,k
2
Si,k

(13)
where
Si,k =
∂2
Ek
∂d2
l,k
= 1
g/

z j,k

= sec h2

z j,k

, f /

xl,k

= 1
g//

z j,k

= −2 tanh

z j,k

sec h2

z j,k

, f //

xl,k

= 0
cj,k =
∂xl,k
∂qlj,k
= g

z j,k

, ai,k =
∂z j,k
∂pji,k
g

z j,k

= tanh

z j,k

, f

xl,k

= xl,k, σi,k =

dl,k − tl,k

.
We substitute the elements of (13) and (11); then, the
Hessian is
Hk = ∇∇Ek =
⎡
⎢
⎢
⎢
⎣
∂2
Ek
∂p2
ji,k
∂2
Ek
∂pji,k∂qlj,k
∂2
Ek
∂pji,k∂qlj,k
∂2
Ek
∂q2
lj,k
⎤
⎥
⎥
⎥
⎦
∂2
Ek
∂p2
ji,k
=

a2
i,kqlj,k ∗ −2g

z j,k

g/

z j,k

dl,k − tl,k

+g/

z j,k
2
qlj,k

∂2
Ek
∂pji,k∂qlj,k
= ai,k g/

z j,k

dl,k − tl,k

+ g

z j,k

qlj,k

∂2
Ek
∂q2
lj,k
= g

z j,k
2
(14)
where ai,k are the artificial neural network inputs, dl,k are the
artificial neural network outputs, g(z j,k) = tanh(z j,k) are the
activation functions, g/
(z j,k) = sec h2
(z j,k) are the deriva-
tives of the activation functions, tl,k are the data set targets,
z j,k = pji,kai,k are the hidden layer outputs, qlj,k are the
weights of the hidden layer.
In the next step, we evaluate the Hessian with the
Levenberg–Marquardt and Newton algorithms.
B. Newton Algorithm
The Newton algorithm constitutes the first alternative to
update the weights for the artificial neural network learning.
We represent the updating of the Newton algorithm as [1], [2]

pji,k+1
qlj,k+1

=

pji,k
qlj,k

− α[Hk]−1
⎡
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
⎤
⎥
⎥
⎦
Hk =

βC,k βE,k
βE,k βD,k

βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
(15)

where the elements (∂2
Ek/∂p2
ji,k), (∂2
Ek/∂q2
lj,k), and
(∂2
Ek/∂pji,k∂qlj,k) are in (14), the elements (∂ Ek/∂qlj,k) and
(∂ Ek/∂pji,k) are in (9) and (10), pji,k and qlj,k are the weights,
and α is the learning factor. The Newton algorithm requires
the existence of the inverse in the Hessian ([Hk]−1
).
Now, we will represent the Newton algorithm of (15) in the
scalar form. First, from (15), we obtain the inverse of Hk as
[Hk]−1
=

βC,k βE,k
βE,k βD,k
−1
=
1
det[Hk]

βD,k −βE,k
−βE,k βC,k

det[Hk] =

βC,k

βD,k

−

βE,k
2
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
. (16)
We substitute [Hk]−1
of (16) into (15) as

pji,k+1
qlj,k+1

=

pji,k
qlj,k

−
α
det[Hk]

βD,k −βE,k
−βE,k βC,k

⎡
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
⎤
⎥
⎥
⎦
det[Hk] =

βC,k

βD,k

−

βE,k
2
. (17)
Rewriting (17) in the scalar form is
pji,k+1 = pji,k − βN ji,k
∂ Ek
∂pji,k
+ γN,k
∂ Ek
∂qlj,k
qlj,k+1 = qlj,k − βNlj,k
∂ Ek
∂qlj,k
+ γN,k
∂ Ek
∂pji,k
βN ji,k = α

βD,k

det[Hk]
, βNlj,k = α

βC,k

det[Hk]
γN,k = α

βE,k

det[Hk]
det[Hk]N = det[Hk] =

βC,k

βD,k

−

βE,k
2
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
(18)
where
∂ Ek
∂pji,k
=

dl,k − tl,k

qlj,k g/

z j,k

ai,k
∂ Ek
∂qlj,k
= g

z j,k

dl,k − tl,k

∂2
Ek
∂p2
ji,k
=

a2
i,kqlj,k ∗ −2g

z j,k

g/

z j,k

dl,k − tl,k

+g/

z j,k
2
qlj,k

∂2
Ek
∂q2
lj,k
= g

z j,k
2
∂2
Ek
∂pji,k∂qlj,k
= ai,k g/

z j,k

dl,k − tl,k

+ g

z j,k

qlj,k

. (19)
βN ji,k, βNlj,k, and γN,k are the learning rates, pji,k and qlj,k
are the weights, α is the learning factor, g(z j,k) = tanh(z j,k)
are the activation functions, and g/
(z j,k) = sec h2
(z j,k) are the
derivative of the activation functions. Equations (23) and (24)
describe the Newton algorithm.
Remark 1: In the Newton algorithm of (18) and (19),
we can observe that a value of zero in (βC,k)(βD,k) − (βE,k)2
of det[Hk]N is a singularity point in the learning rates βN ji,k,
βNlj,k, and γN,k. It results that the Newton algorithm error
is not assured to be stable. Hence, it would be interesting
to consider other alternative algorithm for the artificial neural
network learning.
C. Levenberg–Marquardt Algorithm
The Levenberg–Marquardt algorithm constitutes the second
alternative to update the weights for the artificial neural
network learning. We represent the basic updating of the
Levenberg–Marquardt algorithm as [8]–[11]

pji,k+1
qlj,k+1

=

pji,k
qlj,k

− [Hk + αI]−1
⎡
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
⎤
⎥
⎥
⎦
Hk =

βC,k βE,k
βE,k βD,k

βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
(20)
where the elements (∂2
Ek/∂p2
ji,k), (∂2
Ek/∂q2
lj,k), and
(∂2
Ek/∂pji,k∂qlj,k) are in (14), the elements (∂ Ek/∂qlj,k)
and (∂ Ek/∂pji,k) are in (9) and (10), pji,k and qlj,k are
the weights, and α is the learning factor. The Levenberg–
Marquardt algorithm requires the existence of the inverse in
the Hessian [Hk + αI]−1
.
Now, we will represent the Levenberg–Marquardt algorithm
of (20) in the scalar form. First, from (20), we obtain the
inverse of Hk + αI as
[Hk + αI]−1
=

α + βC,k βE,k
βE,k α + βD,k
−1
=
1
det[Hk + αI]

α + βD,k −βE,k
−βE,k α + βC,k

det[Hk + αI] =

α +

βC,k

α +

βD,k

−

βE,k
2
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
. (21)

We substitute [Hk + αI]−1
into (20) as

pji,k+1
qlj,k+1

=

pji,k
qlj,k

−
α
det[Hk +αI]

α+βD,k −βE,k
−βE,k α+βC,k

⎡
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
⎤
⎥
⎥
⎦
det[Hk +αI] =

α +

βC,k

α +

βD,k

−

βE,k
2
. (22)
Rewriting (22) in the scalar form is
pji,k+1 = pji,k − βLMji,k
∂ Ek
∂pji,k
+ γLM,k
∂ Ek
∂qlj,k
qlj,k+1 = qlj,k − βLMlj,k
∂ Ek
∂qlj,k
+ γLM,k
∂ Ek
∂pji,k
βLMji,k =
α +

βD,k

det[Hk + αI]
, βLMlj,k =
α +

βC,k

det[Hk + αI]
γLM,k =

βE,k

det[Hk + αI]
det[Hk]LM = det[Hk + αI]
=

α +

βC,k

α +

βD,k

−

βE,k
2
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
(23)
where
∂ Ek
∂pji,k
=

dl,k − tl,k

qlj,k g/

z j,k

ai,k
∂ Ek
∂qlj,k
= g

z j,k

dl,k − tl,k

∂2
Ek
∂p2
ji,k
=

a2
i,kqlj,k ∗ −2g

z j,k

g/

z j,k

dl,k − tl,k

+ g/

z j,k
2
qlj,k

∂2
Ek
∂q2
lj,k
= g

z j,k
2
∂2
Ek
∂pji,k∂qlj,k
= ai,k g/

z j,k

dl,k − tl,k

+ g

z j,k

qlj,k

. (24)
βLMji,k, βLMlj,k , and γLM,k are the learning rates, pji,k and qlj,k
are the weights, α is the learning factor, g(z j,k) = tanh(z j,k)
are the activation functions, and g/
(z j,k) = sec h2
(z j,k) are the
derivative of the activation functions. Equations (23) and (24)
describe the Levenberg–Marquardt algorithm.
Remark 2: In the Levenberg–Marquardt algorithm of
(23) and (24), we can observe that a value of zero in (α +
(βC,k))(α+(βD,k ))−(βE,k)2
of det[Hk]LM is a singularity point
in the learning rates βLMji,k, βLMlj,k, and γLM,k. It results that
the Levenberg–Marquardt algorithm error is not assured to be
stable. Hence, it should be interesting to find a way to modify
the Levenberg–Marquardt algorithm to make its error stable.
Fig. 2. Two-hidden-layer artificial neural network.
III. TWO-HIDDEN-LAYER LEVENBERG–MARQUARDT
AND NEWTON ALGORITHMS FOR THE ARTIFICIAL
NEURAL NETWORK LEARNING
In this section, the two-hidden-layer Levenberg–Marquardt
and Newton algorithms are presented as a comparison with the
Levenberg–Marquardt and Newton algorithms for the artificial
neural network learning.
A. Two-Hidden-Layer Hessian for the Artificial
Neural Network Learning
In this article, we use a two-hidden-layer artificial neural
network. This artificial neural network uses hyperbolic tangent
functions in the hidden layer and linear functions in the
output layer. We define the two-hidden-layer artificial neural
network as
dl,k =

j
qlj,k g

i
pji,k g

r
uir,k vr,k

(25)
where pji,k and uir,k are the weights of the two hidden layers,
qlj,k are the weights of the output layer, g(·) are the activation
functions, vr,k are the artificial neural network inputs, dl,k
are the artificial neural network outputs, r is the input layer,
j and i are the hidden layers, l is the output layer, and k is
the iteration.
We consider the two-hidden-layer artificial neural network
shown in Fig. 2. We define pji,k and uir,k as the weights of
the hidden layer and qlj,k as the weights of the output layer.
We define the cost function Ek as
Ek =
1
2
LT

l=1

dl,k − tl,k
2
(26)
where dl,k is the artificial neural network output, tl,k is
the data set target, and LT is the total outputs number.
The second-order partial derivatives of the cost function Ek
with respect to the weights pji,k, uir,k , and qlj,k will be
used to obtain the two-hidden-layer Newton and Levenberg–
Marquardt algorithms.
We consider the forward propagation as
wj,k =

i
uir,k vr,k , ai,k = g

wi,k

z j,k =

i
pji,kai,k, cj,k = g

z j,k

xl,k =

j
qlj,kcj,k, dl,k = f

xl,k

= xl,k (27)

where ai,k are the artificial neural network inputs and dl,k are
the artificial neural network outputs, pji,k and uir,k are hidden
layer weights, and qlj,k are output layer weights.
We consider the activation functions in the two hidden
layers as the hyperbolic tangent functions
g

wi,k

=
ewi,k
− e−wi,k
ewi,k + e−wi,k
= tanh

wi,k

g

z j,k

=
ez j,k
− e−z j,k
ez j,k + e−z j,k
= tanh

z j,k

. (28)
We consider the activation functions of the output layer as the
linear functions
f

xl,k

= xl,k. (29)
We define the second derivative of Ek as the two-hidden-
layer Hessian Hk [25]–[27]
Hk = ∇∇Ek =
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
∂2
E
∂p2
ji
∂2
E
∂pji ∂qlj
∂2
E
∂pji ∂uir
∂2
E
∂pji∂qlj
∂2
E
∂q2
lj
∂2
E
∂qlj ∂uir
∂2
E
∂pji∂uir
∂2
E
∂qlj ∂uir
∂2
E
∂u2
ir
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
. (30)
In the next step, we evaluate the two-hidden-layer Hessian
with the two-hidden-layer Levenberg–Marquardt and Newton
algorithms.
B. Two-Hidden-Layer Newton Algorithm
The two-hidden-layer Newton algorithm constitutes one
alternative to update the weights for the two-hidden-layer
artificial neural network learning. We represent the updating
of the two-hidden-layer Newton algorithm as [1], [2]
⎡
⎣
pji,k+1
qlj,k+1
uir,k+1
⎤
⎦ =
⎡
⎣
pji,k
qlj,k
uir,k
⎤
⎦ − α[Hk]−1
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
∂ Ek
∂uir,k
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
Hk =
⎡
⎣
βC,k βE,k βG,k
βE,k βD,k βL,k
βG,k βL,k βF,k
⎤
⎦,

ir
=

i

r
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
βF,k =

ir
∂2
E
∂u2
ir
, βG,k =

jir
∂2
E
∂pji∂uir
βL,k =

jir
∂2
E
∂qlj ∂uir
,

jir
=

j

i

r
(31)
where pji,k, uir , and qlj,k are the weights and α is the learning
factor. The two-hidden-layer Newton algorithm requires the
existence of the inverse in the Hessian ([Hk]−1
).
From (31), we obtain the inverse of Hk as
[Hk]−1
=
⎡
⎣
βC,k βE,k βG,k
βE,k βD,k βL,k
βG,k βL,k βF,k
⎤
⎦
−1
=
1
det[Hk]
⎡
⎢
⎣

βD,k

βF,k

−

βL,k
2
−

βE,k

βF,k

+

βL,k

βG,k

βE,k

βL,k

−

βD,k

βG,k

−

βE,k

βF,k

+

βL,k

βG,k

βC,k

βF,k

−

βG,k
2
−

βC,k

βL,k

+

βG,k

βE,k

βE,k

βL,k

−

βD,k

βG,k

−

βC,k

βL,k

+

βG,k

βE,k

βC,k

βD,k

−

βE,k
2
⎤
⎥
⎦
det[Hk]N = det[Hk]
=

βC,k

βD,k

βF,k

−

βL,k
2

−

βE,k

βE,k

βF,k

−

βL,k

βG,k

+

βG,k

βE,k

βL,k

−

βD,k

βG,k

. (32)
Remark 3: In the two-hidden-layer Newton algorithm
of (32) and (31), we can observe that values of zero
in (βD,k )(βF,k) − (βL,k)2
, (βE,k)(βF,k) − (βL,k)(βG,k), and
(βE,k)(βL,k) − (βD,k )(βG,k) of det[Hk]N are three singularity
points in the learning rates βN ji,k, βNlj,k , and γN,k. The two-
hidden-layer Newton algorithm of (32) and (31) is worse than
the Newton algorithm of (18) and (19) because the Newton
algorithm of (18) and (19) presents one singularity point,
while the two-hidden-layer Newton algorithm of (32) and (31)
presents three singularity points.
C. Two-Hidden-Layer Levenberg–Marquardt Algorithm
The two-hidden-layer Levenberg–Marquardt algorithm
constitutes one alternative to update the weights for the
two-hidden-layer artificial neural network learning. We rep-
resent the basic updating of the two-hidden-layer Levenberg–
Marquardt algorithm as [8]–[11]
⎡
⎣
pji,k+1
qlj,k+1
uir,k+1
⎤
⎦ =
⎡
⎣
pji,k
qlj,k
uir,k
⎤
⎦ − [Hk + αI]−1
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
∂ Ek
∂uir,k
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
Hk =
⎡
⎣
βC,k βE,k βG,k
βE,k βD,k βL,k
βG,k βL,k βF,k
⎤
⎦,

ir
=

i

r
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
βF,k =

ir
∂2
E
∂u2
ir
, βG,k =

jir
∂2
E
∂pji∂uir
βL,k =

jir
∂2
E
∂qlj ∂uir
,

jir
=

j

i

r
(33)

where pji,k, uir , and qlj,k are the weights and α is the
learning factor. The two-hidden-layer Levenberg–Marquardt
algorithm requires the existence of the inverse in the Hessian
[Hk + αI]−1
.
From (33), we obtain the inverse of Hk + αI as
[Hk +αI]−1
=
⎡
⎣
α + βC,k βE,k βG,k
βE,k α + βD,k βL,k
βG,k βL,k α + βF,k
⎤
⎦
−1
=
1
det[Hk +αI]
⎡
⎢
⎣

α + βD,k

α + βF,k

−

βL,k
2
−

βE,k

α+βF,k

+

βL,k

βG,k

βE,k

βL,k

−

α + βD,k

βG,k

−

βE,k

α+βF,k

+

βL,k

βG,k

α+βC,k

α+βF,k

−

βG,k
2
−

α+βC,k

βL,k

+

βG,k

βE,k

βE,k

βL,k

−

α+βD,k

βG,k

−

α+βC,k

βL,k

+

βG,k

βE,k

α+βC,k

βD,k

−

βE,k
2
⎤
⎥
⎦
det[Hk]LM = det[Hk + αI]
=

α + βC,k

α + βD,k

α + βF,k

−

βL,k
2

−

βE,k

βE,k

α + βF,k

−

βL,k

βG,k

+

βG,k

βE,k

βL,k

−

α + βD,k

βG,k

.
(34)
Remark 4: In the two-hidden-layer Levenberg–Marquardt
algorithm of (34) and (33), we can observe that values of zero
in (α+βD,k)(α+βF,k)−(βL,k)2
, (βE,k)(α+βF,k)−(βL,k)(βG,k),
and (βE,k)(βL,k) − (α + βD,k)(βG,k) of det[Hk]LM are three
singularity points in the learning rates βLMji,k, βLMlj,k, and
γLM,k. The two-hidden-layer Levenberg–Marquardt algorithm
of (34) and (33) is worse than the Levenberg–Marquardt
algorithm of (23) and (24) because Levenberg–Marquardt
algorithm of (23) and (24) presents one singularity point, while
the two-hidden-layer Levenberg–Marquardt algorithm of (34)
and (33) presents three singularity points.
IV. ERROR STABILITY AND WEIGHTS BOUNDEDNESS
ANALYSIS OF THE MODIFIED LEVENBERG–
MARQUARDT ALGORITHM
In this section, the modified Levenberg–Marquardt algo-
rithm is introduced for the artificial neural network learning,
and the error stability and weights boundedness are analyzed.
A. Modified Levenberg–Marquardt Algorithm
The modified Levenberg–Marquardt algorithm is defined as
pji,k+1 = pji,k − βMLM,k
∂ Ek
∂pji,k
+ γMH,k
∂ Ek
∂qlj,k
qlj,k+1 = qlj,k − βMLM,k
∂ Ek
∂qlj,k
+ γMH,k
∂ Ek
∂pji,k
βMLM,k =

α +

βC,k
2

α +

βD,k
2

det[Hk]MLM
det[Hk]MLM =

α +

βA,k
2
+

βB,k
2

∗

α+

βC,k
2

α +

βD,k
2

+

βE,k
2

βA,k =

ji
∂ Ek
∂pji,k

dl,k − tl,k
, βB,k =

j
∂ Ek
∂qlj,k

dl,k − tl,k

βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
, γMH,k = 0

jil
=

j

i

l
,

ji
=

j

i
(35)
where
∂ Ek
∂pji,k
(dl,k − tl,k )
= qlj,k g/
(z j,k)ai,k
∂ Ek
∂qlj,k
(dl,k − tl,k )
= g(z j,k)
∂ Ek
∂pji,k
= (dl,k − tl,k )qlj,k g/
(z j,k)ai,k
∂ Ek
∂qlj,k
= g(z j,k)(dl,k − tl,k )
∂2
Ek
∂p2
ji,k
=

a2
i,kqlj,k ∗

−2g(z j,k)g/
(z j,k)(dl,k − tl,k )
+ g/
(z j,k)2
qlj,k

∂2
Ek
∂q2
lj,k
= g(z j,k)2
∂2
Ek
∂pji,k∂qlj,k
= ai,k g/
(z j,k)

(dl,k − tl,k ) + g(z j,k)qlj,k

. (36)
βMLM,k is the learning rate, pji,k and qlj,k are the weights,
α is the learning factor, g(z j,k) = tanh(z j,k) are the activation
functions, and g/
(z j,k) = sec h2
(z j,k) are the derivative of
the activation functions. Equations (35) and (36) describe the
modified Levenberg–Marquardt algorithm.
Remark 5: The modified Levenberg–Marquardt algorithm
of (35) and (36) is based on the Levenberg–Marquardt algo-
rithm of (23) and (24) and on the Newton algorithm of
(18) and (19) but with the following two differences to assure
the error stability and weights boundedness.
1) A value of zero in (βC,k )(βD,k) − (βE,k)2
of det[Hk]N
is a singularity point in the learning rates βN ji,k, βNlj,k,
and γN,k of the Newton algorithm, and a value of zero
in (α + (βC,k ))(α + (βD,k)) − (βE,k)2
of det[Hk]LM
is a singularity point in the learning rates βLMji,k,
βLMlj,k, and γLM,k of the Levenberg–Marquardt algo-
rithm, while there is not a value of zero in ([α +
(βA,k)2
+(βB,k)2
]∗[(α+(βC,k )2
)(α+(βD,k )2
)+(βE,k)2
])
of det[Hk]MLM, and there is not a singularity point in
the learning rate βMLM,k of the modified Levenberg–
Marquardt algorithm.
2) The Levenberg–Marquardt algorithm has three differ-
ent learning rates βLMji,k, βLMlj,k, and γLM,k, and
the Newton algorithm has three different learning

rates βN ji,k, βNlj,k, and γN,k, while the modified
Levenberg–Marquardt algorithm only has one learning
rate βMLM,k .
The mentioned differences produce that the error stability and
weights boundedness of the modified Levenberg–Marquardt
algorithm will be assured in Section IV-B.
Remark 6: The application of the modified Levenberg–
Marquardt algorithm for the artificial neural network learning
is based on the following steps: 1) obtain the artificial neural
network output dl,k of Fig. 1 with (1) and (3); 2) obtain the
backpropagation of the output layer (∂ Ek/∂qlj,k) with (9),
and the backpropagation of the hidden layer (∂ Ek/∂pji,k)
with (10); and 3) obtain the updating of the weights of the
hidden layer pji,k with (35) and (36) and the weights of the
output layer qlj,k with (35) and (36). Please note that step 3)
represents the artificial neural network learning.
B. Error Stability and Weights Boundedness Analysis
We analyze the error stability of the modified Levenberg–
Marquardt algorithm by the Lyapunov algorithm detailed by
the following theorem.
Theorem 1: The errors of the modified Levenberg–
Marquardt algorithm (1), (3), (35), and (36) applied for the
learning of the data set targets tl,k are uniformly stable, and
the upper bound of the average errors o2
l,k satisfies
lim sup
T →∞
1
T
T

k=2
o2
l,k ≤
2
α
μ2
l (37)
where o2
l,k = (1/2)βMLM,k−1(dl,k−1 − tl,k−1)2
, 0 α ≤ 1 ∈ ,
and 0 βMLM,k ∈ are in (35), (dl,k−1 − tl,k−1) are the
errors, μl are the upper bounds of the uncertainties μl,k, and
|μl,k| μl.
Proof: Define the next positive function
l,k =
1
2
βMLM,k−1

dl,k−1 −tl,k−1
2
+

ji

p2
ji,k +

j

q2
lj,k (38)
where
pji,k and
qlj,k are in (35), (36). Then, l,k is
l,k =
1
2
βMLM,k

dl,k − tl,k
2
+

ji

p2
ji,k+1 +

j

q2
lj,k+1
−
1
2
βMLM,k−1

dl,k−1 − tl,k−1
2
−

ji

p2
ji,k −

j

q2
lj,k.
(39)
Now, the weights errors are as

ji

p2
ji,k+1 =

ji

p2
ji,k − 2βMLM,k
∂ Ek
∂pji,k

ji

pji,k
+ β2
MLM,k

∂ Ek
∂pji,k
2

ji

q2
lj,k+1 =

ji

q2
lj,k − 2βMLM,k
∂ Ek
∂qlj,k

j

qlj,k
+ β2
MLM,k

∂ Ek
∂qlj,k
2
. (40)
Substituting (40) into (39) is
l,k = −2βMLM,k
∂ Ek
∂pji,k

ji

pji,k + β2
MLM,k

∂ Ek
∂pji,k
2
− 2βMLM,k
∂ Ek
∂qlj,k

j

qlj,k + β2
MLM,k

∂ Ek
∂qlj,k
2
+
1
2
βMLM,k

dl,k −tl,k
2
−
1
2
βMLM,k−1

dl,k−1 −tl,k−1
2
.
(41)
Equation (41) is rewritten as
l,k =
1
2
βMLM,k

dl,k − tl,k
2
−
1
2
βMLM,k−1

2
− 2βMLM,k
⎡
⎣ ∂ Ek
∂pji,k

ji

pji,k +
∂ Ek
∂qlj,k

j

qlj,k
⎤
⎦
+ β2
MLM,k

∂ Ek
∂pji,k
2
+

∂ Ek
∂qlj,k
2

. (42)
Using the closed-loop dynamics ((∂ Ek/∂pji,k)/(dl,k −
tl,k ))

ji
pji,k + ((∂ Ek/∂qlj,k)/(dl,k − tl,k ))

j
qlj,k = (dl,k −
tl,k ) − μl,k of [31] and [33] in the second element of (42),
it can be seen that
∂ Ek
∂pji,k

ji

pji,k +
∂ Ek
∂qlj,k

j

qlj,k
=

dl,k − tl,k

⎡
⎣
∂ Ek
∂pji,k

dl,k − tl,k

ji

pji,k +
∂ Ek
∂qlj,k

dl,k −tl,k

j

qlj,k
⎤
⎦
=

dl,k − tl,k

dl,k − tl,k

− μl,k

(43)
where μl,k are the uncertainties. Substituting (43) in the second
element of (42) is
l,k =
1
2
βMLM,k

dl,k − tl,k
2
−
1
2
βMLM,k−1

2
− 2βMLM,k

dl,k − tl,k

dl,k − tl,k

− μl,k

+ β2
MLM,k
⎡
⎣
⎛
⎝

ji
∂ Ek
∂pji,k
⎞
⎠
2
+
⎛
⎝

j
∂ Ek
∂qlj,k
⎞
⎠
2⎤
⎦
l,k =
1
2
βMLM,k

dl,k − tl,k
2
−
1
2
βMLM,k−1

2
− 2βMLM,k

dl,k − tl,k
2
+ 2βMLM,k

dl,k − tl,k

μl,k
+ β2
MLM,k

dl,k − tl,k
2
βA,k
2
+

βB,k
2

(44)
where βA,k =

ji ((∂ Ek/∂pji,k)/(dl,k − tl,k )) and βB,k =

j ((∂ Ek/∂qlj,k)/(dl,k − tl,k )). Substituting βMLM,k of (35)
into the element β2
MLM,k(dl,k − tl,k )2
[(βA,k)2
+ (βB,k)2
] and
considering α ≤ 1 is given in (45), as shown at the bottom of
the next page. In (45), βA,k =

ji ((∂ Ek/∂pji,k)/(dl,k − tl,k ))
and βB,k =

j ((∂ Ek/∂qlj,k)/(dl,k −tl,k)). Taking in to account

that 2βMLM,k (dl,k − tl,k)μl,k ≤ (1/2)βMLM,k(dl,k − tl,k )2
+
2βMLM,kμ2
l,k and employing (45) in (44) gives
l,k ≤
1
2
βMLM,k

dl,k − tl,k
2
−
1
2
βMLM,k−1

2
− 2βMLM,k

dl,k − tl,k
2
+
1
2
βMLM,k

dl,k − tl,k
2
+ 2βMLM,kμ2
l,k + βMLM,k

dl,k − tl,k
2
l,k ≤ −
1
2
βMLM,k−1

2
+ 2βMLM,k μ2
l,k. (46)
From (35)
βMLM,k
=

α+

βC,k
2

α+

βD,k
2

α+

βA,k
2
+

βB,k
2

α+

βC,k
2

α+

βD,k
2

+

βE,k
2

≤
1
α
. (47)
Employing (47) and |μl,k| ≤ μl in (46) gives
l,k ≤ −
1
2
βMLM,k−1

2
+
2
α
μ2
l . (48)
Employing (48), the errors of the modified Levenberg–
Marquardt are uniformly stable. Hence, l,k is bounded.
Taking into account (48) and o2
l,k of (37), it is
l,k ≤ −o2
l,k +
2
α
μ2
l . (49)
Summarizing (49) from 2 to T is
T

k=2

o2
l,k −
2
α
μ2
l

≤ l,1 − l,T . (50)
Employing that l,T 0 is bounded
1
T
T

k=2
o2
l,k ≤
2
α
μ2
l +
1
T
l,1 ⇒ lim sup
T→∞
1
T
T

k=2
o2
l,k ≤
2
α
μ2
l .
(51)
Equation (51) is similar to (37).
Remark 7: The result of Theorem 1 that the errors of
the modified Levenberg–Marquardt algorithm for the artificial
neural network learning are assured to be stable produces
that the artificial neural network outputs dl,k of the modified
Levenberg–Marquardt algorithm remain bounded during all
the training and testing.
The following theorem proves the weights boundedness of
the modified Levenberg–Marquardt.
Theorem 2: When the average errors o2
l,k+1 are bigger than
the uncertainties (2/α)μ2
l , the weights errors are bounded by
the initial weights errors as
o2
l,k+1 ≥
2
α
μ2
l ⇒

ji

p2
ji,k+1 +

j

q2
lj,k+1 ≤

ji

p2
ji,1+

j

q2
lj,1
(52)
where
p2
ji,k+1 and
q2
lj,k+1 are the weights,
p2
ji,1 and
q2
lj,1 are the
initial weights, o2
l,k+1 = (1/2)βMLM,k(dl,k −tl,k )2
and (dl,k−1 −
tl,k−1) are the errors, and 0 α ≤ 1 ∈ , 0 βMLM,k ∈ ,
and μl are the upper bounds of the uncertainties μl,k ,
|μl,k| μl.
Proof: From (40), the weights are written as

ji

p2
ji,k+1 =

ji

p2
ji,k − 2βMLM,k
∂ Ek
∂pji,k

ji

pji,k
+ β2
MLM,k

∂ Ek
∂pji,k
2

ji

q2
lj,k+1 =

ji

q2
lj,k − 2βMLM,k
∂ Ek
∂qlj,k

j

qlj,k
+ β2
MLM,k

∂ Ek
∂qlj,k
2
. (53)
Adding

ji
p2
ji,k+1 with

ji
q2
lj,k+1 of (53) gives

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
− 2βMLM,k
∂ Ek
∂pji,k

ji

pji,k + β2
MLM,k

∂ Ek
∂pji,k
2
− 2βMLM,k
∂ Ek
∂qlj,k

j

qlj,k + β2
MLM,k

∂ Ek
∂qlj,k
2
. (54)
Equation (54) is represented as

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
− 2βMLM,k
⎡
⎣ ∂ Ek
∂pji,k

ji

pji,k +
∂ Ek
∂qlj,k

j

qlj,k
⎤
⎦
+ β2
MLM,k

∂ Ek
∂pji,k
2
+

∂ Ek
∂qlj,k
2

. (55)
β2
MLM,k

dl,k − tl,k
2
βA,k
2
+

βB,k
2

= βMLM,k

βA,k
2
+

βB,k
2

βMLM,k

dl,k − tl,k
2
=
⎛
⎝

βA,k
2
+

βB,k
2

α+

βC,k
2

α+

βD,k
2

α+

βA,k
2
+

βB,k
2

α+

βC,k
2

α+

βD,k
2

+

βE,k
2
∗ βMLM,k

dl,k − tl,k
2
⎞
⎠
≤ βMLM,k

dl,k − tl,k
2
. (45)

Substituting (∂ Ek/∂pji,k)

ji
pji,k + (∂ Ek/∂qlj,k)

j
qlj,k =
(dl,k − tl,k )[(dl,k − tl,k ) − μl,k ] of (43) in the second element
of (55) gives

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
− 2βMLM,k

dl,k − tl,k

dl,k − tl,k

− μl,k

+ β2
MLM,k

∂ Ek
∂pji,k
2
+

∂ Ek
∂qlj,k
2

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
− 2βMLM,k

dl,k − tl,k
2
+ 2βMLM,k

dl,k − tl,k

μl,k
+ β2
MLM,k

dl,k − tl,k
2
βA,k
2
+

βB,k
2

(56)
where μl,k are the uncertainties, βA,k =

ji ((∂ Ek/∂pji,k)/
(dl,k − tl,k )), and βB,k =

j ((∂ Ek/∂qlj,k)/(dl,k − tl,k)).
Substituting 2βMLM,k(dl,k − tl,k )μl,k ≤ (1/2)βMLM,k(dl,k −
tl,k )2
+ 2βMLM,k μ2
l,k into the third element of (56) and
β2
MLM,k(dl,k − tl,k)2
[(βA,k)2
+ (βB,k)2
] ≤ βMLM,k (dl,k − tl,k )2
of (45) into the last element of (56) give

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
− 2βMLM,k

dl,k − tl,k
2
+
1
2
βMLM,k

dl,k − tl,k
2
+ 2βMLM,kμ2
l,k + βMLM,k

dl,k − tl,k
2

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
−
1
2
βMLM,k

dl,k − tl,k
2
+ 2βMLM,kμ2
l,k . (57)
From (47), βMLM,k ≤ (1/α), and using |μl,k | ≤ μl in (57)
gives

ji

p2
ji,k+1 +

ji

q2
lj,k+1 =

ji

p2
ji,k +

ji

q2
lj,k
−
1
2
βMLM,k

dl,k − tl,k
2
+
2
α
μ2
l . (58)
Taking into account o2
l,k+1 = (1/2)βMLM,k(dl,k − tl,k )2
is
o2
l,k+1 ≥
2
α4
μ2
l ⇒

ji

p2
ji,k+1 +

ji

q2
lj,k+1 ≤

ji

p2
ji,k +

ji

q2
lj,k.
(59)
Taking into account that o2
l,k+1 ≥ (2/α)μ2
l for k ∈ [1, k] is
true, hence

ji

p2
ji,k+1 +

ji

q2
lj,k+1 ≤

ji

p2
ji,k +

ji

q2
lj,k
≤ · · · ≤

ji

p2
ji,1 +

ji

q2
lj,1. (60)
Then, (52) is proven.
Remark 8: The result of Theorem 2 that the weights of
the modified Levenberg–Marquardt algorithm are bounded
produces that the hidden layer weights pji,k and output layer
weights qlj,k of the modified Levenberg–Marquardt algorithm
for the artificial neural network learning remain bounded
during all the training and testing.
V. RESULTS
In this section, we compare the Newton algorithm (N) of (1),
(3), (18), (19), and [1] and [2], the Levenberg–Marquardt
algorithm (LM) of (1), (3), (23), (24), and [8]–[11], and the
modified Levenberg–Marquardt algorithm (MLM) of (1), (3),
(35), and (36) for the artificial neural network learning of
electric signal data set because they are based on the Hessian,
and we compare the stable gradient algorithm in a neural
network (SGNN) of [31] and [32], the stable gradient algo-
rithm in a radial basis function neural network (SGRBFNN)
of [33], [34], and the modified Levenberg–Marquardt algo-
rithm (MLM) of (1), (3), (35), and (36) for the artificial
neural network learning of brain signal data set because they
are based on the stability. The objective of N, LM, SGNN,
SGRBFNN, and MLM is that the artificial neural network
outputs dl,k must follow the data set targets tl,k as near as
possible.
In this part of this article, the abovementioned algorithms
are applied for the artificial neural network learning con-
taining the training and testing stages. The root-mean-square
error (RMSE) is utilized to show the performance accuracy
for the comparisons, and it is represented as
E =

1
T
T

k=1
LT

l

dl,k − tl,k
2
1
2
(61)
where dl,k − tl,k are the errors, dl,k are the artificial neural
network outputs, tl,k is the data set targets, LT is the total
outputs number, and T is the final iteration.
A. Electric Signals
The electric signal data set information is obtained from
Electricity Load and Price Forecasting with MATLAB where
the details are explained in [35]. The electric signal data
set is the history of electric energy usage at each hour and
temperature observations of the International Organization for
Standardization (ISO) of Great Britain. The meteorological
information includes the temperature of the dry bulb and the
dew point, taking into account the electric signal data set of
the hourly electric energy usage called an electric signal.
In the electric signal data set, we consider eight inputs
described as follows: a1,k is the temperature of the dry bulb,

Fig. 3. Training for the first electric signal data set.
a2,k is the dew point, a3,k is the hour of the day, a4,k is the
day of the week, a5,k is a mark indicating if this is a free or a
weekend day, a6,k is the medium load of the past day, a7,k is
the load of the same hour, in the past day, and a8,k is the load
of the same hour and day of the past week, and we consider 1
target described as follows: t1,k is the load of the same day.
In the artificial neural network learning, we consider eight
artificial neural network inputs denoted as a1,k, a2,k, a3,k, a4,k,
a5,k, a6,k, a7,k, and a8,k that are the same inputs of the electric
signal data set, and we consider one artificial neural network
output denoted as d1,k. We utilize 7000 iterations of the data
set for the artificial neural network training, and we utilize
1000 iterations of the data set for the artificial neural network
testing. The objective of N, LM, and MLM is that the artificial
neural network output d1,k must follow the target t1,k as near
as possible.
The N of [1] and [2] is detailed as (1), (3), (18), and (19)
with eight inputs, one output, and five neurons in the hidden
layer, α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a
random number between 0 and 1.
The LM of [8]–[11] is detailed as (1), (3), (23), and (24)
with eight inputs, one output, and five neurons in the hidden
layer, α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a
random number between 0 and 1.
The MLM is detailed as (1), (3), (35), and (36), with
eight inputs, one output, and five neurons in the hidden layer,
α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a random
number between 0 and 1.
The comparisons for the training and testing of the N, LM,
and MLM for the first electric signal data set are shown in
Figs. 3 and 4. The weights of the MLM for the first electric
signal data set are shown in Figs. 5 and 6. The comparisons
for the training and testing of the N, LM, and MLM for the
second electric signal data set are shown in Figs. 7 and 8. The
weights of the MLM for the second electric signal data set
are shown in Figs. 9 and 10. The training and testing RMSE
comparisons of the performance accuracy (61) for the first
electric signal data set are shown in Table I and, for the second
electric signal data set, are shown in Table II. Please note that
the most important data are related to the output d1,k.
To improve the training and testing, more neurons in the hid-
den layer could be included; nevertheless, this decision could
increase the computational cost. From Figs. 3, 4, 7, and 8,
Fig. 4. Testing for the first electric signal data set.
Fig. 5. Hidden layer weights for the first electric signal data set.
Fig. 6. Output layer weights for the first electric signal data set.
TABLE I
RMSE FOR THE FIRST ELECTRIC SIGNAL DATA SET
it is observed that the MLM improves the LM and N because
the signal of the MLM follows better the electric signal data set
than the other. From Figs. 5, 6, 9, and 10, it is observed that the

Fig. 7. Training for the second electric signal data set.
Fig. 8. Testing for the second electric signal data set.
Fig. 9. Hidden layer weights for the second electric signal data set.
TABLE II
RMSE FOR THE SECOND ELECTRIC SIGNAL DATA SET
weights of the MLM remain bounded. From Tables I and II,
it is observed that the MLM achieves better performance
accuracy for training and testing compared with LM and N
Fig. 10. Output layer weights for the second electric signal data set.
because the RMSE is the smallest for the MLM. Thus, MLM
is the best option for learning in the electric signal data set.
B. Brain Signals
The brain signal data set information is obtained from our
laboratory where the details are explained in [36]. The brain
signal data set is the real data of brain signals. The alpha signal
is obtained in this study because it has more probabilities to
be found. The acquisition system is applied with a 28-year old
healthy man when his eyes are closed. There are four different
signals received by the brain signals.
In the brain signal data set, we consider three inputs
described as follows: a1,k is the brain signal of the focal
point 1, a2,k is the brain signal of the focal point 2, and a3,k
is the brain signal of the focal point 3, and we consider 1
target described as follows: t1,k is the brain signal of the focal
point 4.
In the artificial neural network learning, we consider three
artificial neural network inputs denoted as a1,k, a2,k, and a3,k
that are the same inputs of the brain signal data set, and we
consider one artificial neural network output denoted as d1,k.
We utilize 7000 iterations of the data set for the artificial
neural network training, and we utilize 1000 iterations of the
data set for the artificial neural network testing. The objective
of SGNN, SGRBFNN, and MLM is that the artificial neural
network output d1,k must follow the target t1,k as near as
possible.
The SGNN of [31] and [32] is detailed with three inputs,
one output, and five neurons in the hidden layer, α = 0.9,
pji,1 = rand, qlj,1 = rand, and rand is a random number
between 0 and 1.
The SGRBFNN of [33] and [34] is detailed with three
inputs, one output, and five neurons in the hidden layer,
The MLM is detailed as (1), (3), (35), and (36) with three
inputs, one output, and five neurons in the hidden layer,
The comparisons for the training and testing of the SGNN,
SGRBFNN, and MLM for the first brain signal data set are

Fig. 11. Training for the first brain signal data set.
Fig. 12. Testing for the first brain signal data set.
Fig. 13. Hidden layer weights for the first brain signal data set.
shown in Figs. 11 and 12. The weights of the MLM for
the first brain signal data set are shown in Figs. 13 and 14.
The comparisons for the training and testing of the SGNN,
SGRBFNN, and MLM for the second brain signal data set
in Figs. 15 and 16. The weights of the MLM for the second
brain signal data set in Figs. 17 and 18. The training and
testing RMSE comparisons of the performance accuracy (61)
for the first brain signal data set are shown in Table III and, for
the second brain signal data set, are shown in Table IV. Please
note that the most important data are related to the output d1,k.
To improve the training and testing, more neurons in the hid-
den layer could be included; nevertheless, this decision could
increase the computational cost. From Figs. 11, 12, 15, and 16,
Fig. 14. Output layer weights for the first brain signal data set.
Fig. 15. Training for the second brain signal data set.
TABLE III
RMSE FOR THE FIRST BRAIN SIGNAL DATA SET
Fig. 16. Testing for the second brain signal data set.
it is observed that the MLM improves the SGRBFNN and
SGNN because the signal of the MLM follows better the brain
signal data set than the other. From Figs. 13, 14, 17, and 18,

Fig. 17. Hidden layer weights for the second brain signal data set.
Fig. 18. Output layer weights for the second brain signal data set.
TABLE IV
RMSE FOR THE SECOND BRAIN SIGNAL DATA SET
it is observed that the weights of the MLM remain bounded.
From Table IV, it is observed that the MLM achieves better
performance accuracy for training and testing compared with
SGRBFNN and SGNN because the RMSE is the smallest for
the MLM. Thus, the MLM is the best option for learning in
the brain signal data set.
Remark 9: The result of Theorem 1 that the error of the
MLM is assured to be stable, while the error some of the N,
LM, SGNN, and SGRBFNN are not assured to be stable can
be observed mainly in the training of Figs. 3, 7, 11, and 15 and
in the testing of Figs. 4, 8, 12, and 16, where the signals of
the N, LM, and SGNN are unbounded during the training or
testing, while the signal of the MLM remains bounded during
all the training and testing.
Remark 10: The result of Theorem 2 that the weights of
the MLM are bounded can be observed mainly in the hidden
layer weights of Figs. 5, 9, 13, and 17 and in the output layer
weights of Figs. 6, 10, 14, and 18, where the weights of the
MLM remain bounded during all the training. The weights of
the MLM also remain bounded during all the testing because
they take the last value obtained during the training.
VI. CONCLUSION
The objective of this article is to introduce an algorithm
called modified Levenberg–Marquardt for the artificial neural
network learning. The modified Levenberg–Marquardt was
compared with the Newton, Levenberg–Marquardt, and stable
gradient algorithms for learning of the electric and brain signal
data set, resulting in that we obtained the best performance
accuracy with the modified Levenberg–Marquardt because we
obtained the nearest following of the artificial neural network
output to the data set target and because we obtained the
smallest value in the RMSE. In the forthcoming work, we will
propose other algorithms for the artificial neural network
learning to compare with our results, or we will apply our
algorithm for the learning of other robotic or mechatronic
systems.
ACKNOWLEDGMENT
The author is grateful for the Editor-in-Chief, Associate Edi-
tor, and Reviewers for their valuable comments and insightful
suggestions that helped to improve this research significantly.
He would also like to thank the Instituto Politécnico Nacional,
the Secretaría de Investigación y Posgrado, the Comisión de
Operación y Fomento de Actividades Académicas, and the
Consejo Nacional de Ciencia y Tecnología for their help in
this research.
REFERENCES
[1] S. Kostić and D. Vasović, “Prediction model for compressive strength of
basic concrete mixture using artificial neural networks,” Neural Comput.
Appl., vol. 26, no. 5, pp. 1005–1024, Jul. 2015.
[2] B. Sahoo and P. K. Bhaskaran, “Prediction of storm surge and inundation
using climatological datasets for the indian coast using soft computing
techniques,” Soft Comput., vol. 23, no. 23, pp. 12363–12383, Dec. 2019.
[3] T.-L. Le, “Intelligent fuzzy controller design for antilock braking sys-
tems,” J. Intell. Fuzzy Syst., vol. 36, no. 4, pp. 3303–3315, Apr. 2019.
[4] C. Yin, S. Wu, S. Zhou, J. Cao, X. Huang, and Y. Cheng, “Design
and stability analysis of multivariate extremum seeking with Newton
method,” J. Franklin Inst., vol. 355, no. 4, pp. 1559–1578, Mar. 2018.
[5] S. Chakia, B. Shanmugarajanb, S. Ghosalc, and G. Padmanabham,
“Application of integrated soft computing techniques for optimisation of
hybrid CO2 laser–MIG welding process,” Appl. Soft Comput., vol. 30,
pp. 365–374, May 2015.
[6] Y. Li, H. Zhang, J. Han, and Q. Sun, “Distributed multi-agent opti-
mization via event-triggered based continuous-time Newton–Raphson
algorithm,” Neurocomputing, vol. 275, pp. 1416–1425, Jan. 2018.
[7] M. S. Salim and A. I. Ahmed, “A quasi-Newton augmented lagrangian
algorithm for constrained optimization problems,” J. Intell. Fuzzy Syst.,
vol. 35, no. 2, pp. 2373–2382, Aug. 2018.
[8] C. Lv et al., “Levenberg–arquardt backpropagation training of multilayer
neural networks for state estimation of a safety-critical cyber-physical
system,” IEEE Trans. Ind. Informat., vol. 14, no. 8, pp. 3436–3446,
Aug. 2018.
[9] M. J. Rana, M. S. Shahriar, and M. Shafiullah, “Levenberg–Marquardt
neural network to estimate UPFC-coordinated PSS parameters to
enhance power system stability,” Neural Comput. Appl., vol. 31,
pp. 1237–1248, Jul. 2019.
[10] A. Sarabakha, N. Imanberdiyev, E. Kayacan, M. A. Khanesar, and
H. Hagras, “Novel Levenberg–Marquardt based learning algorithm for
unmanned aerial vehicles,” Inf. Sci., vol. 417, pp. 361–380, Nov. 2017.
[11] J. S. Smith, B. Wu, and B. M. Wilamowski, “Neural network training
with Levenberg–Marquardt and adaptable weight compression,” IEEE
Trans. Neural Netw. Learn. Syst., vol. 30, no. 2, pp. 580–587, Feb. 2019.

[12] H. G. Han, Y. Li, Y. N. Guo, and J. F. Qiao, “A soft computing method to
predict sludge volume index based on a recurrent self-organizing neural
network,” Appl. Soft Comput., vol. 38, pp. 477–486, Jan. 2016.
[13] J. Qiao, L. Wang, C. Yang, and K. Gu, “Adaptive Levenberg-Marquardt
algorithm based echo state network for chaotic time series prediction,”
IEEE Access, vol. 6, pp. 10720–10732, 2018.
[14] A. Parsaie, A. H. Haghiabi, M. Saneie, and H. Torabi, “Applica-
tions of soft computing techniques for prediction of energy dissipa-
tion on stepped spillways,” Neural Comput. Appl., vol. 29, no. 12,
pp. 1393–1409, Jun. 2018.
[15] N. Zhang and D. Shetty, “An effective LS-SVM-based approach for
surface roughness prediction in machined surfaces,” Neurocomputing,
vol. 198, pp. 35–39, Jul. 2016.
[16] E. Esme and B. Karlik, “Fuzzy c-means based support vector machines
classifier for perfume recognition,” Appl. Soft Comput., vol. 46,
pp. 452–458, Sep. 2016.
[17] P. Fergus, I. Idowu, A. Hussain, and C. Dobbins, “Advanced artificial
neural network classification for detecting preterm births using EHG
records,” Neurocomputing, vol. 188, pp. 42–49, May 2016.
[18] A. Narang, B. Batra, A. Ahuja, J. Yadav, and N. Pachauri, “Classifica-
tion of EEG signals for epileptic seizures using Levenberg-Marquardt
algorithm based multilayer perceptron neural network,” J. Intell. Fuzzy
Syst., vol. 34, no. 3, pp. 1669–1677, Mar. 2018.
[19] J. Dong, K. Lu, J. Xue, S. Dai, R. Zhai, and W. Pan, “Accelerated non-
rigid image registration using improved Levenberg–Marquardt method,”
Inf. Sci., vol. 423, pp. 66–79, Jan. 2018.
[20] J. Li, W. X. Zheng, J. Gu, and L. Hua, “Parameter estimation algorithms
for Hammerstein output error systems using Levenberg–Marquardt opti-
mization method with varying interval measurements,” J. Franklin Inst.,
vol. 354, pp. 316–331, Jan. 2017.
[21] X. Yang, B. Huang, and H. Gao, “A direct maximum likelihood
optimization approach to identification of LPV time-delay systems,”
J. Franklin Inst., vol. 353, no. 8, pp. 1862–1881, May 2016.
[22] I. S. Baruch, V. A. Quintana, and E. P. Reynaud, “Complex-valued neural
network topology and learning applied for identification and control of
nonlinear systems,” Neurocomputing, vol. 233, pp. 104–115, Apr. 2017.
[23] M. Kaminski and T. Orlowska-Kowalska, “An on-line trained neural
controller with a fuzzy learning rate of the Levenberg–Marquardt
algorithm for speed control of an electrical drive with an elastic joint,”
Appl. Soft Comput., vol. 32, pp. 509–517, Jul. 2015.
[24] S. Roshan, Y. Miche, A. Akusok, and A. Lendasse, “Adaptive and online
network intrusion detection system using clustering and extreme learning
machines,” J. Franklin Inst., vol. 355, no. 4, pp. 1752–1779, Mar. 2018.
[25] C. Bishop, “Exact calculation of the hessian matrix for the multilayer
perceptron,” Neural Comput., vol. 4, no. 4, pp. 494–501, Jul. 1992.
[26] C. M. Bishop, “A fast procedure for retraining the multilayer percep-
tron,” Int. J. Neural Syst., vol. 2, no. 3, pp. 229–236, 1991.
[27] C. M. Bishop, “Curvature-driven smoothing in feedforward networks,”
in Proc. Seattle Int. Joint Conf. Neural Netw. (IJCNN), 1990, p. 749.
[28] G. Cybenko, “Approximation by superpositions of a sigmoidal function,”
Math. Control, Signals, Syst., vol. 2, no. 4, pp. 303–314, Dec. 1989.
[29] R. B. Ash, Real Analysis and Probability. New York, NY, USA:
Academic, 1972.
[30] J. S. R. Jang and C. T. Sun, Neuro-Fuzzy and Soft Computing. Upper
Saddle River, NJ, USA: Prentice-Hall, 1996.
[31] J. de Jesús Rubio, P. Angelov, and J. Pacheco, “Uniformly stable
backpropagation algorithm to train a feedforward neural network,” IEEE
Trans. Neural Netw., vol. 22, no. 3, pp. 356–366, Mar. 2011.
[32] W. Yu and X. Li, “Discrete-time neuro identification without robust mod-
ification,” IEE Proc.-Control Theory Appl., vol. 150, no. 3, pp. 311–316,
May 2003.
[33] J. D. J. Rubio, I. Elias, D. R. Cruz, and J. Pacheco, “Uniform stable
radial basis function neural network for the prediction in two mecha-
tronic processes,” Neurocomputing, vol. 227, pp. 122–130, Mar. 2017.
[34] J. D. J. Rubio, “USNFIS: Uniform stable neuro fuzzy inference system,”
Neurocomputing, vol. 262, pp. 57–66, Nov. 2017.
[35] I. Elias et al., “Genetic algorithm with radial basis mapping network
for the electricity consumption modeling,” Appl. Sci., vol. 10, no. 12,
p. 4239, Jun. 2020.
[36] J. D. J. Rubio, D. M. Vázquez, and D. Mújica-Vargas, “Acquisition
system and approximation of brain signals,” IET Sci., Meas. Technol.,
vol. 7, no. 4, pp. 232–239, Jul. 2013.
José de Jesús Rubio (Member, IEEE) is currently
a full-time Professor with the Sección de Estudios
de Posgrado e Investigación, ESIME Azcapotzalco,
Instituto Politécnico Nacional, Ciudad de México,
Mexico. He has published over 142 international
journal articles with 2214 cites from Scopus. He has
been the tutor of four Ph.D. students, 20 Ph.D.
students, 42 M.S. students, 4 S. students, and 17 B.S.
students.
Dr. Rubio was a Guest Editor of Neurocomputing,
Applied Soft Computing, Sensors, The Journal of
Supercomputing, Computational Intelligence and Neuroscience, Frontiers in
Psychology, and the Journal of Real-Time Image Processing. He also serves as
an Associate Editor for the IEEE TRANSACTIONS ON NEURAL NETWORKS
AND LEARNING SYSTEMS, the IEEE TRANSACTIONS ON FUZZY SYSTEMS,
Neural Computing and Applications, Frontiers in Neurorobotics, and Mathe-
matical Problems in Engineering.

10.1109@TNNLS.2020.3015200.pdf

Recommended

Recommended

More Related Content

Similar to 10.1109@TNNLS.2020.3015200.pdf

Similar to 10.1109@TNNLS.2020.3015200.pdf (20)

Recently uploaded

Recently uploaded (20)

10.1109@TNNLS.2020.3015200.pdf