SlideShare a Scribd company logo
1 of 15
Download to read offline
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1
Stability Analysis of the Modified
Levenberg–Marquardt Algorithm
for the Artificial Neural
Network Training
José de Jesús Rubio , Member, IEEE
Abstract—The Levenberg–Marquardt and Newton are two
algorithms that use the Hessian for the artificial neural network
learning. In this article, we propose a modified Levenberg–
Marquardt algorithm for the artificial neural network learning
containing the training and testing stages. The modified
Levenberg–Marquardt algorithm is based on the Levenberg–
Marquardt and Newton algorithms but with the following two
differences to assure the error stability and weights boundedness:
1) there is a singularity point in the learning rates of the
Levenberg–Marquardt and Newton algorithms, while there is not
a singularity point in the learning rate of the modified Levenberg–
Marquardt algorithm and 2) the Levenberg–Marquardt and
Newton algorithms have three different learning rates, while the
modified Levenberg–Marquardt algorithm only has one learning
rate. The error stability and weights boundedness of the modi-
fied Levenberg–Marquardt algorithm are assured based on the
Lyapunov technique. We compare the artificial neural network
learning with the modified Levenberg–Marquardt, Levenberg–
Marquardt, Newton, and stable gradient algorithms for the
learning of the electric and brain signals data set.
Index Terms—Error stability, Levenberg–Marquardt, Newton,
weights boundedness.
I. INTRODUCTION
THE second-order partial derivatives of the cost function
with respect to the weights are known as the Hessian.
The Hessian of a convex function is a positive semidefinite.
If the Hessian is positive definite at a point, then the convex
function attains a minimum at that point. This property of the
Hessian makes it an attractive alternative for artificial neural
network learning. The Levenberg–Marquardt and Newton are
two algorithms that use the Hessian for the artificial neural
network learning containing the training and testing stages.
There are some interesting applications of the Levenberg–
Marquardt and Newton algorithms. In [1] and [2], the Newton
algorithm is used for learning. In [3] and [4], the New-
ton algorithm is utilized for the control. In [5]–[7], the
Manuscript received November 29, 2019; revised April 7, 2020; accepted
August 5, 2020.
The author is with the Sección de Estudios de Posgrado e Investigación,
Esime Azcapotzalco, Instituto Politécnico Nacional, Ciudad de México 02250,
Mexico (e-mail: rubio.josedejesus@gmail.com).
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2020.3015200
Newton algorithm is considered for the optimization.
In [8]–[11], the Levenberg–Marquardt algorithm is used for the
learning. In [12]–[15], the Levenberg–Marquardt algorithm is
considered for the prediction. In [16]–[18], the Levenberg–
Marquardt algorithm is utilized for the classification.
In [19]–[21], the Levenberg–Marquardt algorithm is used for
the optimization. In [22] and [23], the Levenberg–Marquardt
algorithm is considered for the control. In [24], the Levenberg–
Marquardt algorithm is considered for the detection. Since the
Levenberg–Marquardt and Newton algorithms have been con-
sidered in several applications, they could be good alternatives
for the artificial neural network learning.
If the Hessian is positive definite at a point, then a con-
vex function attains a minimum at that point, but the point
must be a singular point [25]–[27]. In this article, we study
this problem presented in Levenberg–Marquardt and Newton
algorithms that use the Hessian for the artificial neural net-
work learning by the following steps: 1) we represent the
Levenberg–Marquardt and Newton algorithms in the scalar
form and 2) we show that the Levenberg–Marquardt and
Newton algorithms in the scalar form contain the main terms
denoted as the learning rates. In the Levenberg–Marquardt and
Newton algorithms, a value of zero in their determinants is
a singularity point in their learning rates. It results that the
Levenberg–Marquardt or Newton algorithms errors are not
assured to be stable. It should be interesting to find a way to
modify one of the Levenberg–Marquardt or Newton algorithms
to make its error stable.
In this article, we propose the modified Levenberg–
Marquardt algorithm for the artificial neural network learning.
The modified Levenberg–Marquardt algorithm is based on the
Levenberg–Marquardt and Newton algorithms but with the
following two differences to assure the error stability and
weights boundedness: 1) there is a singularity point in the
learning rates of the Levenberg–Marquardt and Newton algo-
rithms, while there is not a singularity point in the learning rate
of the modified Levenberg–Marquardt algorithm; therefore,
the learning rate in the modified Levenberg–Marquardt algo-
rithm obtains bounded values and 2) the Levenberg–Marquardt
and Newton algorithms have three different learning rates,
while the modified Levenberg–Marquardt algorithm only has
one learning rate. It results that the error stability and weights
2162-237X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
boundedness of the modified Levenberg–Marquardt algorithm
can be assured based on the Lyapunov technique; therefore,
the artificial neural network outputs and weights of the mod-
ified Levenberg–Marquardt algorithm remain bounded during
all the training and testing.
In [25]–[27], there is an interesting procedure to compute
the Levenberg–Marquardt and Newton algorithms for an arti-
ficial neural network with multiple hidden layers that are
useful in the deep learning. Different to the abovementioned
work, this article computes the modified Levenberg–Marquardt
algorithm for an artificial neural network with a single hidden
layer because of the following four reasons: 1) we show
that the two-hidden-layer Levenberg–Marquardt and Newton
algorithms are worse than the Levenberg–Marquardt and New-
ton algorithms because the Levenberg–Marquardt and Newton
algorithms present one singularity point, while the two-hidden-
layer Levenberg–Marquardt and Newton algorithms present
three singularity points; 2) there is a computational concern
that computing the inverse of the Levenberg–Marquardt and
Newton algorithms for an artificial neural network with mul-
tiple hidden layers would be very expensive; 3) in [28]–[30],
they show based on the Stone–Weierstrass theorem that the
targets can be arbitrarily well approximated by an artificial
neural network with a single hidden layer and a hyperbolic
tangent function; and 4) this article is mainly focused in
assuring the stability of the modified Levenberg–Marquardt
algorithm for an artificial neural network with a single hidden
layer.
Finally, we compare the artificial neural network learn-
ing with the modified Levenberg–Marquardt, the Levenberg–
Marquardt algorithm [8]–[11], the Newton algorithm [1], [2],
the stable gradient algorithm in a neural network [31], [32],
and the stable gradient algorithm in a radial basis function
neural network [33], [34] for the learning of the electric and
brain signals data set. The electric signal data set information
is obtained from electricity load and price forecasting with
MATLAB where the details are explained in [35]. The brain
signal data set information is obtained from our laboratory
where the details are explained in [36].
The remainder of this article is organized as follows.
Section II presents the Levenberg–Marquardt and Newton
algorithms for artificial neural network learning. Section III
discusses the two-hidden-layer Levenberg–Marquardt and
Newton algorithms for the two-hidden-layer artificial neural
network learning. Section IV introduces the modified
Levenberg–Marquardt for the artificial neural network learn-
ing, and the error stability and weights boundedness are
assured. Section V shows the comparison results of several
algorithms for the learning of the electric and brain signals
data set. In Section VI, conclusions and forthcoming work are
detailed.
II. LEVENBERG–MARQUARDT AND NEWTON
ALGORITHMS FOR THE ARTIFICIAL
NEURAL NETWORK LEARNING
The algorithms for the artificial neural network learning
frequently evaluate the first derivative of the cost function with
Fig. 1. Artificial neural network.
respect to the weights. Nevertheless, there are several cases
where it is interesting to evaluate the second derivatives of the
cost function with respect to the weights. The second-order
partial derivatives of the cost function with respect to the
weights are known as the Hessian.
A. Hessian for the Artificial Neural Network Learning
In this article, we use a special artificial neural network with
one hidden layer. It could be extended to a general multilayer
artificial neural network; nevertheless, this research is focused
on a compact artificial neural network. This artificial neural
network uses hyperbolic tangent functions in the hidden layer
and linear functions in the output layer. We define the artificial
neural network as
dl,k =

j
qlj,k g


i
pji,kai,k

(1)
where pji,k are the weights of the hidden layer, qlj,k are the
weights of the output layer, g(·) are the activation functions,
ai,k are the artificial neural network inputs, dl,k are the artificial
neural network outputs, i is the input layer, j is the hidden
layer, l is the output layer, and k is the iteration.
We consider the artificial neural network of Fig. 1.
We define pji,k as the weights of the hidden layer and qlj,k as
the weights of the output layer.
We define the cost function Ek as
Ek =
1
2
LT

l=1

dl,k − tl,k
2
(2)
where dl,k are the artificial neural network outputs, tl,k are
the data set targets, and LT is the total outputs number. The
second-order partial derivatives of the cost function Ek with
respect to the weights pji,k and qlj,k will be used to obtain the
Newton and Levenberg–Marquardt algorithms.
We consider the forward propagation as
z j,k =

i
pji,kai,k, cj,k = g

z j,k

xl,k =

j
qlj,kcj,k, dl,k = f

xl,k

= xl,k (3)
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 3
where ai,k are the artificial neural network inputs and dl,k are
the artificial neural network outputs, pji,k are hidden layer
weights, and qlj,k are output layer weights.
We consider the activation functions in the hidden layer as
the hyperbolic tangent functions
g

z j,k

=
ez j,k
− e−z j,k
ez j,k + e−z j,k
= tanh

z j,k

. (4)
The first and second derivatives of the hyperbolic tangent
functions (4) are
g/

z j,k

=
4

ez j,k + e−z j,k
2
= sec h2

z j,k

g//

z j,k

= −2
ez j,k
− e−z j,k
ez j,k + e−z j,k
4

ez j,k + e−z j,k
2
= −2 tanh

z j,k

sec h2

z j,k

=−2g

z j,k

g/

z j,k

. (5)
We consider the activation functions of the output layer as
the linear functions
f

xl,k

= xl,k. (6)
The first and second derivatives of the linear functions (6) are
f /

xl,k

= 1, f //

xl,k

= 0. (7)
The first and second derivatives of the cost function (2) are
∂ Ek
∂dl,k
=

dl,k − tl,k

,
∂2
Ek
∂d2
l,k
= 1. (8)
Using the cost function (2), we obtain the backpropagation
of the output layer as
∂ Ek
∂qlj,k
=
∂ Ek
∂dl,k
∂dl,k
∂xl,k
∂xl,k
∂qlj,k
=

dl,k − tl,k
∂ f

xl,k

∂xl,k
cj,k
=

dl,k − tl,k
∂xl,k
∂xl,k
cj,k =

dl,k − tl,k

(1)g

z j,k

=

dl,k − tl,k

g

z j,k

(9)
where f (xl,k ) = xl,k of (6) and g(z j,k) = tanh(z j,k) of (4).
Using the cost function (2), we obtain the backpropagation
of the hidden layer as
∂ Ek
∂pji,k
=
∂ Ek
∂dl,k
∂dl,k
∂xl,k
∂xl,k
∂cj,k
∂cj,k
∂z j,k
∂z j,k
∂pji,k
=

dl,k − tl,k

(1)qlj,k g/

z j,k

ai,k
=

dl,k − tl,k

qlj,k g/

z j,k

ai,k (10)
where g/
(z j,k) = (∂cj,k/∂z j,k) = (∂g(z j,k)/∂z j,k) =
sec h2
(z j,k) of (5).
We define the second derivative of Ek as the Hessian Hk
[25]–[27]
Hk = ∇∇Ek =
⎡
⎢
⎢
⎢
⎣
∂2
Ek
∂p2
ji,k
∂2
Ek
∂pji,k∂qlj,k
∂2
Ek
∂pji,k∂qlj,k
∂2
Ek
∂q2
lj,k
⎤
⎥
⎥
⎥
⎦
(11)
where the Hessian is symmetrical
∂2
Ek
∂pji,k∂qjl,k
=
∂2
Ek
∂qjl,k∂pji,k
. (12)
The Hessian elements are
∂2
Ek
∂p2
ji,k
= a2
i,kqlj,k g//

z j,k

σi,k + g/

z j,k
2
qlj,k Si,k

∂2
Ek
∂pji,k∂qlj,k
= ai,k g/

z j,k

σi,k + cj,kqlj,k Si,k

∂2
Ek
∂q2
lj,k
= c2
j,k f //

xl,k

σi,k + f /

xl,k
2
Si,k

(13)
where
Si,k =
∂2
Ek
∂d2
l,k
= 1
g/

z j,k

= sec h2

z j,k

, f /

xl,k

= 1
g//

z j,k

= −2 tanh

z j,k

sec h2

z j,k

, f //

xl,k

= 0
cj,k =
∂xl,k
∂qlj,k
= g

z j,k

, ai,k =
∂z j,k
∂pji,k
g

z j,k

= tanh

z j,k

, f

xl,k

= xl,k, σi,k =

dl,k − tl,k

.
We substitute the elements of (13) and (11); then, the
Hessian is
Hk = ∇∇Ek =
⎡
⎢
⎢
⎢
⎣
∂2
Ek
∂p2
ji,k
∂2
Ek
∂pji,k∂qlj,k
∂2
Ek
∂pji,k∂qlj,k
∂2
Ek
∂q2
lj,k
⎤
⎥
⎥
⎥
⎦
∂2
Ek
∂p2
ji,k
=

a2
i,kqlj,k ∗ −2g

z j,k

g/

z j,k

dl,k − tl,k

+g/

z j,k
2
qlj,k

∂2
Ek
∂pji,k∂qlj,k
= ai,k g/

z j,k

dl,k − tl,k

+ g

z j,k

qlj,k

∂2
Ek
∂q2
lj,k
= g

z j,k
2
(14)
where ai,k are the artificial neural network inputs, dl,k are the
artificial neural network outputs, g(z j,k) = tanh(z j,k) are the
activation functions, g/
(z j,k) = sec h2
(z j,k) are the deriva-
tives of the activation functions, tl,k are the data set targets,
z j,k = pji,kai,k are the hidden layer outputs, qlj,k are the
weights of the hidden layer.
In the next step, we evaluate the Hessian with the
Levenberg–Marquardt and Newton algorithms.
B. Newton Algorithm
The Newton algorithm constitutes the first alternative to
update the weights for the artificial neural network learning.
We represent the updating of the Newton algorithm as [1], [2]

pji,k+1
qlj,k+1

=

pji,k
qlj,k

− α[Hk]−1
⎡
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
⎤
⎥
⎥
⎦
Hk =

βC,k βE,k
βE,k βD,k

βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
(15)
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
where the elements (∂2
Ek/∂p2
ji,k), (∂2
Ek/∂q2
lj,k), and
(∂2
Ek/∂pji,k∂qlj,k) are in (14), the elements (∂ Ek/∂qlj,k) and
(∂ Ek/∂pji,k) are in (9) and (10), pji,k and qlj,k are the weights,
and α is the learning factor. The Newton algorithm requires
the existence of the inverse in the Hessian ([Hk]−1
).
Now, we will represent the Newton algorithm of (15) in the
scalar form. First, from (15), we obtain the inverse of Hk as
[Hk]−1
=

βC,k βE,k
βE,k βD,k
−1
=
1
det[Hk]

βD,k −βE,k
−βE,k βC,k

det[Hk] =

βC,k

βD,k

−

βE,k
2
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
. (16)
We substitute [Hk]−1
of (16) into (15) as

pji,k+1
qlj,k+1

=

pji,k
qlj,k

−
α
det[Hk]

βD,k −βE,k
−βE,k βC,k

⎡
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
⎤
⎥
⎥
⎦
det[Hk] =

βC,k

βD,k

−

βE,k
2
. (17)
Rewriting (17) in the scalar form is
pji,k+1 = pji,k − βN ji,k
∂ Ek
∂pji,k
+ γN,k
∂ Ek
∂qlj,k
qlj,k+1 = qlj,k − βNlj,k
∂ Ek
∂qlj,k
+ γN,k
∂ Ek
∂pji,k
βN ji,k = α

βD,k

det[Hk]
, βNlj,k = α

βC,k

det[Hk]
γN,k = α

βE,k

det[Hk]
det[Hk]N = det[Hk] =

βC,k

βD,k

−

βE,k
2
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
(18)
where
∂ Ek
∂pji,k
=

dl,k − tl,k

qlj,k g/

z j,k

ai,k
∂ Ek
∂qlj,k
= g

z j,k

dl,k − tl,k

∂2
Ek
∂p2
ji,k
=

a2
i,kqlj,k ∗ −2g

z j,k

g/

z j,k

dl,k − tl,k

+g/

z j,k
2
qlj,k

∂2
Ek
∂q2
lj,k
= g

z j,k
2
∂2
Ek
∂pji,k∂qlj,k
= ai,k g/

z j,k

dl,k − tl,k

+ g

z j,k

qlj,k

. (19)
βN ji,k, βNlj,k, and γN,k are the learning rates, pji,k and qlj,k
are the weights, α is the learning factor, g(z j,k) = tanh(z j,k)
are the activation functions, and g/
(z j,k) = sec h2
(z j,k) are the
derivative of the activation functions. Equations (23) and (24)
describe the Newton algorithm.
Remark 1: In the Newton algorithm of (18) and (19),
we can observe that a value of zero in (βC,k)(βD,k) − (βE,k)2
of det[Hk]N is a singularity point in the learning rates βN ji,k,
βNlj,k, and γN,k. It results that the Newton algorithm error
is not assured to be stable. Hence, it would be interesting
to consider other alternative algorithm for the artificial neural
network learning.
C. Levenberg–Marquardt Algorithm
The Levenberg–Marquardt algorithm constitutes the second
alternative to update the weights for the artificial neural
network learning. We represent the basic updating of the
Levenberg–Marquardt algorithm as [8]–[11]

pji,k+1
qlj,k+1

=

pji,k
qlj,k

− [Hk + αI]−1
⎡
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
⎤
⎥
⎥
⎦
Hk =

βC,k βE,k
βE,k βD,k

βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
(20)
where the elements (∂2
Ek/∂p2
ji,k), (∂2
Ek/∂q2
lj,k), and
(∂2
Ek/∂pji,k∂qlj,k) are in (14), the elements (∂ Ek/∂qlj,k)
and (∂ Ek/∂pji,k) are in (9) and (10), pji,k and qlj,k are
the weights, and α is the learning factor. The Levenberg–
Marquardt algorithm requires the existence of the inverse in
the Hessian [Hk + αI]−1
.
Now, we will represent the Levenberg–Marquardt algorithm
of (20) in the scalar form. First, from (20), we obtain the
inverse of Hk + αI as
[Hk + αI]−1
=

α + βC,k βE,k
βE,k α + βD,k
−1
=
1
det[Hk + αI]

α + βD,k −βE,k
−βE,k α + βC,k

det[Hk + αI] =

α +

βC,k

α +

βD,k

−

βE,k
2
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
. (21)
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 5
We substitute [Hk + αI]−1
into (20) as

pji,k+1
qlj,k+1

=

pji,k
qlj,k

−
α
det[Hk +αI]

α+βD,k −βE,k
−βE,k α+βC,k

⎡
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
⎤
⎥
⎥
⎦
det[Hk +αI] =

α +

βC,k

α +

βD,k

−

βE,k
2
. (22)
Rewriting (22) in the scalar form is
pji,k+1 = pji,k − βLMji,k
∂ Ek
∂pji,k
+ γLM,k
∂ Ek
∂qlj,k
qlj,k+1 = qlj,k − βLMlj,k
∂ Ek
∂qlj,k
+ γLM,k
∂ Ek
∂pji,k
βLMji,k =
α +

βD,k

det[Hk + αI]
, βLMlj,k =
α +

βC,k

det[Hk + αI]
γLM,k =

βE,k

det[Hk + αI]
det[Hk]LM = det[Hk + αI]
=

α +

βC,k

α +

βD,k

−

βE,k
2
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
(23)
where
∂ Ek
∂pji,k
=

dl,k − tl,k

qlj,k g/

z j,k

ai,k
∂ Ek
∂qlj,k
= g

z j,k

dl,k − tl,k

∂2
Ek
∂p2
ji,k
=

a2
i,kqlj,k ∗ −2g

z j,k

g/

z j,k

dl,k − tl,k

+ g/

z j,k
2
qlj,k

∂2
Ek
∂q2
lj,k
= g

z j,k
2
∂2
Ek
∂pji,k∂qlj,k
= ai,k g/

z j,k

dl,k − tl,k

+ g

z j,k

qlj,k

. (24)
βLMji,k, βLMlj,k , and γLM,k are the learning rates, pji,k and qlj,k
are the weights, α is the learning factor, g(z j,k) = tanh(z j,k)
are the activation functions, and g/
(z j,k) = sec h2
(z j,k) are the
derivative of the activation functions. Equations (23) and (24)
describe the Levenberg–Marquardt algorithm.
Remark 2: In the Levenberg–Marquardt algorithm of
(23) and (24), we can observe that a value of zero in (α +
(βC,k))(α+(βD,k ))−(βE,k)2
of det[Hk]LM is a singularity point
in the learning rates βLMji,k, βLMlj,k, and γLM,k. It results that
the Levenberg–Marquardt algorithm error is not assured to be
stable. Hence, it should be interesting to find a way to modify
the Levenberg–Marquardt algorithm to make its error stable.
Fig. 2. Two-hidden-layer artificial neural network.
III. TWO-HIDDEN-LAYER LEVENBERG–MARQUARDT
AND NEWTON ALGORITHMS FOR THE ARTIFICIAL
NEURAL NETWORK LEARNING
In this section, the two-hidden-layer Levenberg–Marquardt
and Newton algorithms are presented as a comparison with the
Levenberg–Marquardt and Newton algorithms for the artificial
neural network learning.
A. Two-Hidden-Layer Hessian for the Artificial
Neural Network Learning
In this article, we use a two-hidden-layer artificial neural
network. This artificial neural network uses hyperbolic tangent
functions in the hidden layer and linear functions in the
output layer. We define the two-hidden-layer artificial neural
network as
dl,k =

j
qlj,k g


i
pji,k g


r
uir,k vr,k

(25)
where pji,k and uir,k are the weights of the two hidden layers,
qlj,k are the weights of the output layer, g(·) are the activation
functions, vr,k are the artificial neural network inputs, dl,k
are the artificial neural network outputs, r is the input layer,
j and i are the hidden layers, l is the output layer, and k is
the iteration.
We consider the two-hidden-layer artificial neural network
shown in Fig. 2. We define pji,k and uir,k as the weights of
the hidden layer and qlj,k as the weights of the output layer.
We define the cost function Ek as
Ek =
1
2
LT

l=1

dl,k − tl,k
2
(26)
where dl,k is the artificial neural network output, tl,k is
the data set target, and LT is the total outputs number.
The second-order partial derivatives of the cost function Ek
with respect to the weights pji,k, uir,k , and qlj,k will be
used to obtain the two-hidden-layer Newton and Levenberg–
Marquardt algorithms.
We consider the forward propagation as
wj,k =

i
uir,k vr,k , ai,k = g

wi,k

z j,k =

i
pji,kai,k, cj,k = g

z j,k

xl,k =

j
qlj,kcj,k, dl,k = f

xl,k

= xl,k (27)
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
where ai,k are the artificial neural network inputs and dl,k are
the artificial neural network outputs, pji,k and uir,k are hidden
layer weights, and qlj,k are output layer weights.
We consider the activation functions in the two hidden
layers as the hyperbolic tangent functions
g

wi,k

=
ewi,k
− e−wi,k
ewi,k + e−wi,k
= tanh

wi,k

g

z j,k

=
ez j,k
− e−z j,k
ez j,k + e−z j,k
= tanh

z j,k

. (28)
We consider the activation functions of the output layer as the
linear functions
f

xl,k

= xl,k. (29)
We define the second derivative of Ek as the two-hidden-
layer Hessian Hk [25]–[27]
Hk = ∇∇Ek =
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
∂2
E
∂p2
ji
∂2
E
∂pji ∂qlj
∂2
E
∂pji ∂uir
∂2
E
∂pji∂qlj
∂2
E
∂q2
lj
∂2
E
∂qlj ∂uir
∂2
E
∂pji∂uir
∂2
E
∂qlj ∂uir
∂2
E
∂u2
ir
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
. (30)
In the next step, we evaluate the two-hidden-layer Hessian
with the two-hidden-layer Levenberg–Marquardt and Newton
algorithms.
B. Two-Hidden-Layer Newton Algorithm
The two-hidden-layer Newton algorithm constitutes one
alternative to update the weights for the two-hidden-layer
artificial neural network learning. We represent the updating
of the two-hidden-layer Newton algorithm as [1], [2]
⎡
⎣
pji,k+1
qlj,k+1
uir,k+1
⎤
⎦ =
⎡
⎣
pji,k
qlj,k
uir,k
⎤
⎦ − α[Hk]−1
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
∂ Ek
∂uir,k
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
Hk =
⎡
⎣
βC,k βE,k βG,k
βE,k βD,k βL,k
βG,k βL,k βF,k
⎤
⎦,

ir
=

i

r
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
βF,k =

ir
∂2
E
∂u2
ir
, βG,k =

jir
∂2
E
∂pji∂uir
βL,k =

jir
∂2
E
∂qlj ∂uir
,

jir
=

j

i

r
(31)
where pji,k, uir , and qlj,k are the weights and α is the learning
factor. The two-hidden-layer Newton algorithm requires the
existence of the inverse in the Hessian ([Hk]−1
).
From (31), we obtain the inverse of Hk as
[Hk]−1
=
⎡
⎣
βC,k βE,k βG,k
βE,k βD,k βL,k
βG,k βL,k βF,k
⎤
⎦
−1
=
1
det[Hk]
⎡
⎢
⎣

βD,k

βF,k

−

βL,k
2
−

βE,k

βF,k

+

βL,k

βG,k


βE,k

βL,k

−

βD,k

βG,k

−

βE,k

βF,k

+

βL,k

βG,k


βC,k

βF,k

−

βG,k
2
−

βC,k

βL,k

+

βG,k

βE,k


βE,k

βL,k

−

βD,k

βG,k

−

βC,k

βL,k

+

βG,k

βE,k


βC,k

βD,k

−

βE,k
2
⎤
⎥
⎦
det[Hk]N = det[Hk]
=

βC,k
 
βD,k

βF,k

−

βL,k
2

−

βE,k

βE,k

βF,k

−

βL,k

βG,k

+

βG,k

βE,k

βL,k

−

βD,k

βG,k

. (32)
Remark 3: In the two-hidden-layer Newton algorithm
of (32) and (31), we can observe that values of zero
in (βD,k )(βF,k) − (βL,k)2
, (βE,k)(βF,k) − (βL,k)(βG,k), and
(βE,k)(βL,k) − (βD,k )(βG,k) of det[Hk]N are three singularity
points in the learning rates βN ji,k, βNlj,k , and γN,k. The two-
hidden-layer Newton algorithm of (32) and (31) is worse than
the Newton algorithm of (18) and (19) because the Newton
algorithm of (18) and (19) presents one singularity point,
while the two-hidden-layer Newton algorithm of (32) and (31)
presents three singularity points.
C. Two-Hidden-Layer Levenberg–Marquardt Algorithm
The two-hidden-layer Levenberg–Marquardt algorithm
constitutes one alternative to update the weights for the
two-hidden-layer artificial neural network learning. We rep-
resent the basic updating of the two-hidden-layer Levenberg–
Marquardt algorithm as [8]–[11]
⎡
⎣
pji,k+1
qlj,k+1
uir,k+1
⎤
⎦ =
⎡
⎣
pji,k
qlj,k
uir,k
⎤
⎦ − [Hk + αI]−1
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎣
∂ Ek
∂pji,k
∂ Ek
∂qlj,k
∂ Ek
∂uir,k
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎦
Hk =
⎡
⎣
βC,k βE,k βG,k
βE,k βD,k βL,k
βG,k βL,k βF,k
⎤
⎦,

ir
=

i

r
βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
,

jil
=

j

i

l
βF,k =

ir
∂2
E
∂u2
ir
, βG,k =

jir
∂2
E
∂pji∂uir
βL,k =

jir
∂2
E
∂qlj ∂uir
,

jir
=

j

i

r
(33)
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 7
where pji,k, uir , and qlj,k are the weights and α is the
learning factor. The two-hidden-layer Levenberg–Marquardt
algorithm requires the existence of the inverse in the Hessian
[Hk + αI]−1
.
From (33), we obtain the inverse of Hk + αI as
[Hk +αI]−1
=
⎡
⎣
α + βC,k βE,k βG,k
βE,k α + βD,k βL,k
βG,k βL,k α + βF,k
⎤
⎦
−1
=
1
det[Hk +αI]
⎡
⎢
⎣

α + βD,k

α + βF,k

−

βL,k
2
−

βE,k

α+βF,k

+

βL,k

βG,k


βE,k

βL,k

−

α + βD,k

βG,k

−

βE,k

α+βF,k

+

βL,k

βG,k


α+βC,k

α+βF,k

−

βG,k
2
−

α+βC,k

βL,k

+

βG,k

βE,k


βE,k

βL,k

−

α+βD,k

βG,k

−

α+βC,k

βL,k

+

βG,k

βE,k


α+βC,k

βD,k

−

βE,k
2
⎤
⎥
⎦
det[Hk]LM = det[Hk + αI]
=

α + βC,k
 
α + βD,k

α + βF,k

−

βL,k
2

−

βE,k

βE,k

α + βF,k

−

βL,k

βG,k

+

βG,k

βE,k

βL,k

−

α + βD,k

βG,k

.
(34)
Remark 4: In the two-hidden-layer Levenberg–Marquardt
algorithm of (34) and (33), we can observe that values of zero
in (α+βD,k)(α+βF,k)−(βL,k)2
, (βE,k)(α+βF,k)−(βL,k)(βG,k),
and (βE,k)(βL,k) − (α + βD,k)(βG,k) of det[Hk]LM are three
singularity points in the learning rates βLMji,k, βLMlj,k, and
γLM,k. The two-hidden-layer Levenberg–Marquardt algorithm
of (34) and (33) is worse than the Levenberg–Marquardt
algorithm of (23) and (24) because Levenberg–Marquardt
algorithm of (23) and (24) presents one singularity point, while
the two-hidden-layer Levenberg–Marquardt algorithm of (34)
and (33) presents three singularity points.
IV. ERROR STABILITY AND WEIGHTS BOUNDEDNESS
ANALYSIS OF THE MODIFIED LEVENBERG–
MARQUARDT ALGORITHM
In this section, the modified Levenberg–Marquardt algo-
rithm is introduced for the artificial neural network learning,
and the error stability and weights boundedness are analyzed.
A. Modified Levenberg–Marquardt Algorithm
The modified Levenberg–Marquardt algorithm is defined as
pji,k+1 = pji,k − βMLM,k
∂ Ek
∂pji,k
+ γMH,k
∂ Ek
∂qlj,k
qlj,k+1 = qlj,k − βMLM,k
∂ Ek
∂qlj,k
+ γMH,k
∂ Ek
∂pji,k
βMLM,k =

α +

βC,k
2

α +

βD,k
2

det[Hk]MLM
det[Hk]MLM =

α +

βA,k
2
+

βB,k
2

∗

α+

βC,k
2

α +

βD,k
2

+

βE,k
2

βA,k =

ji
∂ Ek
∂pji,k

dl,k − tl,k
, βB,k =

j
∂ Ek
∂qlj,k

dl,k − tl,k

βC,k =

jil
∂2
Ek
∂p2
ji,k
, βD,k =

j
∂2
Ek
∂q2
lj,k
βE,k =

jil
∂2
Ek
∂pji,k∂qlj,k
, γMH,k = 0

jil
=

j

i

l
,

ji
=

j

i
(35)
where
∂ Ek
∂pji,k
(dl,k − tl,k )
= qlj,k g/
(z j,k)ai,k
∂ Ek
∂qlj,k
(dl,k − tl,k )
= g(z j,k)
∂ Ek
∂pji,k
= (dl,k − tl,k )qlj,k g/
(z j,k)ai,k
∂ Ek
∂qlj,k
= g(z j,k)(dl,k − tl,k )
∂2
Ek
∂p2
ji,k
=

a2
i,kqlj,k ∗

−2g(z j,k)g/
(z j,k)(dl,k − tl,k )
+ g/
(z j,k)2
qlj,k

∂2
Ek
∂q2
lj,k
= g(z j,k)2
∂2
Ek
∂pji,k∂qlj,k
= ai,k g/
(z j,k)

(dl,k − tl,k ) + g(z j,k)qlj,k

. (36)
βMLM,k is the learning rate, pji,k and qlj,k are the weights,
α is the learning factor, g(z j,k) = tanh(z j,k) are the activation
functions, and g/
(z j,k) = sec h2
(z j,k) are the derivative of
the activation functions. Equations (35) and (36) describe the
modified Levenberg–Marquardt algorithm.
Remark 5: The modified Levenberg–Marquardt algorithm
of (35) and (36) is based on the Levenberg–Marquardt algo-
rithm of (23) and (24) and on the Newton algorithm of
(18) and (19) but with the following two differences to assure
the error stability and weights boundedness.
1) A value of zero in (βC,k )(βD,k) − (βE,k)2
of det[Hk]N
is a singularity point in the learning rates βN ji,k, βNlj,k,
and γN,k of the Newton algorithm, and a value of zero
in (α + (βC,k ))(α + (βD,k)) − (βE,k)2
of det[Hk]LM
is a singularity point in the learning rates βLMji,k,
βLMlj,k, and γLM,k of the Levenberg–Marquardt algo-
rithm, while there is not a value of zero in ([α +
(βA,k)2
+(βB,k)2
]∗[(α+(βC,k )2
)(α+(βD,k )2
)+(βE,k)2
])
of det[Hk]MLM, and there is not a singularity point in
the learning rate βMLM,k of the modified Levenberg–
Marquardt algorithm.
2) The Levenberg–Marquardt algorithm has three differ-
ent learning rates βLMji,k, βLMlj,k, and γLM,k, and
the Newton algorithm has three different learning
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
rates βN ji,k, βNlj,k, and γN,k, while the modified
Levenberg–Marquardt algorithm only has one learning
rate βMLM,k .
The mentioned differences produce that the error stability and
weights boundedness of the modified Levenberg–Marquardt
algorithm will be assured in Section IV-B.
Remark 6: The application of the modified Levenberg–
Marquardt algorithm for the artificial neural network learning
is based on the following steps: 1) obtain the artificial neural
network output dl,k of Fig. 1 with (1) and (3); 2) obtain the
backpropagation of the output layer (∂ Ek/∂qlj,k) with (9),
and the backpropagation of the hidden layer (∂ Ek/∂pji,k)
with (10); and 3) obtain the updating of the weights of the
hidden layer pji,k with (35) and (36) and the weights of the
output layer qlj,k with (35) and (36). Please note that step 3)
represents the artificial neural network learning.
B. Error Stability and Weights Boundedness Analysis
We analyze the error stability of the modified Levenberg–
Marquardt algorithm by the Lyapunov algorithm detailed by
the following theorem.
Theorem 1: The errors of the modified Levenberg–
Marquardt algorithm (1), (3), (35), and (36) applied for the
learning of the data set targets tl,k are uniformly stable, and
the upper bound of the average errors o2
l,k satisfies
lim sup
T →∞
1
T
T

k=2
o2
l,k ≤
2
α
μ2
l (37)
where o2
l,k = (1/2)βMLM,k−1(dl,k−1 − tl,k−1)2
, 0  α ≤ 1 ∈ ,
and 0  βMLM,k ∈  are in (35), (dl,k−1 − tl,k−1) are the
errors, μl are the upper bounds of the uncertainties μl,k, and
|μl,k|  μl.
Proof: Define the next positive function
l,k =
1
2
βMLM,k−1

dl,k−1 −tl,k−1
2
+

ji

p2
ji,k +

j

q2
lj,k (38)
where 
pji,k and 
qlj,k are in (35), (36). Then,  l,k is
 l,k =
1
2
βMLM,k

dl,k − tl,k
2
+

ji

p2
ji,k+1 +

j

q2
lj,k+1
−
1
2
βMLM,k−1

dl,k−1 − tl,k−1
2
−

ji

p2
ji,k −

j

q2
lj,k.
(39)
Now, the weights errors are as

ji

p2
ji,k+1 =

ji

p2
ji,k − 2βMLM,k
∂ Ek
∂pji,k

ji

pji,k
+ β2
MLM,k

∂ Ek
∂pji,k
2

ji

q2
lj,k+1 =

ji

q2
lj,k − 2βMLM,k
∂ Ek
∂qlj,k

j

qlj,k
+ β2
MLM,k

∂ Ek
∂qlj,k
2
. (40)
Substituting (40) into (39) is
 l,k = −2βMLM,k
∂ Ek
∂pji,k

ji

pji,k + β2
MLM,k

∂ Ek
∂pji,k
2
− 2βMLM,k
∂ Ek
∂qlj,k

j

qlj,k + β2
MLM,k

∂ Ek
∂qlj,k
2
+
1
2
βMLM,k

dl,k −tl,k
2
−
1
2
βMLM,k−1

dl,k−1 −tl,k−1
2
.
(41)
Equation (41) is rewritten as
 l,k =
1
2
βMLM,k

dl,k − tl,k
2
−
1
2
βMLM,k−1

dl,k−1 − tl,k−1
2
− 2βMLM,k
⎡
⎣ ∂ Ek
∂pji,k

ji

pji,k +
∂ Ek
∂qlj,k

j

qlj,k
⎤
⎦
+ β2
MLM,k

∂ Ek
∂pji,k
2
+

∂ Ek
∂qlj,k
2

. (42)
Using the closed-loop dynamics ((∂ Ek/∂pji,k)/(dl,k −
tl,k ))

ji 
pji,k + ((∂ Ek/∂qlj,k)/(dl,k − tl,k ))

j
qlj,k = (dl,k −
tl,k ) − μl,k of [31] and [33] in the second element of (42),
it can be seen that
∂ Ek
∂pji,k

ji

pji,k +
∂ Ek
∂qlj,k

j

qlj,k
=

dl,k − tl,k

⎡
⎣
∂ Ek
∂pji,k

dl,k − tl,k


ji

pji,k +
∂ Ek
∂qlj,k

dl,k −tl,k


j

qlj,k
⎤
⎦
=

dl,k − tl,k

dl,k − tl,k

− μl,k

(43)
where μl,k are the uncertainties. Substituting (43) in the second
element of (42) is
 l,k =
1
2
βMLM,k

dl,k − tl,k
2
−
1
2
βMLM,k−1

dl,k−1 − tl,k−1
2
− 2βMLM,k

dl,k − tl,k

dl,k − tl,k

− μl,k

+ β2
MLM,k
⎡
⎣
⎛
⎝

ji
∂ Ek
∂pji,k
⎞
⎠
2
+
⎛
⎝

j
∂ Ek
∂qlj,k
⎞
⎠
2⎤
⎦
 l,k =
1
2
βMLM,k

dl,k − tl,k
2
−
1
2
βMLM,k−1

dl,k−1 − tl,k−1
2
− 2βMLM,k

dl,k − tl,k
2
+ 2βMLM,k

dl,k − tl,k

μl,k
+ β2
MLM,k

dl,k − tl,k
2 
βA,k
2
+

βB,k
2

(44)
where βA,k =

ji ((∂ Ek/∂pji,k)/(dl,k − tl,k )) and βB,k =

j ((∂ Ek/∂qlj,k)/(dl,k − tl,k )). Substituting βMLM,k of (35)
into the element β2
MLM,k(dl,k − tl,k )2
[(βA,k)2
+ (βB,k)2
] and
considering α ≤ 1 is given in (45), as shown at the bottom of
the next page. In (45), βA,k =

ji ((∂ Ek/∂pji,k)/(dl,k − tl,k ))
and βB,k =

j ((∂ Ek/∂qlj,k)/(dl,k −tl,k)). Taking in to account
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 9
that 2βMLM,k (dl,k − tl,k)μl,k ≤ (1/2)βMLM,k(dl,k − tl,k )2
+
2βMLM,kμ2
l,k and employing (45) in (44) gives
 l,k ≤
1
2
βMLM,k

dl,k − tl,k
2
−
1
2
βMLM,k−1

dl,k−1 − tl,k−1
2
− 2βMLM,k

dl,k − tl,k
2
+
1
2
βMLM,k

dl,k − tl,k
2
+ 2βMLM,kμ2
l,k + βMLM,k

dl,k − tl,k
2
 l,k ≤ −
1
2
βMLM,k−1

dl,k−1 − tl,k−1
2
+ 2βMLM,k μ2
l,k. (46)
From (35)
βMLM,k
=

α+

βC,k
2

α+

βD,k
2

α+

βA,k
2
+

βB,k
2
 
α+

βC,k
2

α+

βD,k
2

+

βE,k
2

≤
1
α
. (47)
Employing (47) and |μl,k| ≤ μl in (46) gives
 l,k ≤ −
1
2
βMLM,k−1

dl,k−1 − tl,k−1
2
+
2
α
μ2
l . (48)
Employing (48), the errors of the modified Levenberg–
Marquardt are uniformly stable. Hence, l,k is bounded.
Taking into account (48) and o2
l,k of (37), it is
 l,k ≤ −o2
l,k +
2
α
μ2
l . (49)
Summarizing (49) from 2 to T is
T

k=2

o2
l,k −
2
α
μ2
l

≤ l,1 − l,T . (50)
Employing that l,T  0 is bounded
1
T
T

k=2
o2
l,k ≤
2
α
μ2
l +
1
T
l,1 ⇒ lim sup
T→∞
1
T
T

k=2
o2
l,k ≤
2
α
μ2
l .
(51)
Equation (51) is similar to (37).
Remark 7: The result of Theorem 1 that the errors of
the modified Levenberg–Marquardt algorithm for the artificial
neural network learning are assured to be stable produces
that the artificial neural network outputs dl,k of the modified
Levenberg–Marquardt algorithm remain bounded during all
the training and testing.
The following theorem proves the weights boundedness of
the modified Levenberg–Marquardt.
Theorem 2: When the average errors o2
l,k+1 are bigger than
the uncertainties (2/α)μ2
l , the weights errors are bounded by
the initial weights errors as
o2
l,k+1 ≥
2
α
μ2
l ⇒

ji

p2
ji,k+1 +

j

q2
lj,k+1 ≤

ji

p2
ji,1+

j

q2
lj,1
(52)
where 
p2
ji,k+1 and 
q2
lj,k+1 are the weights, 
p2
ji,1 and 
q2
lj,1 are the
initial weights, o2
l,k+1 = (1/2)βMLM,k(dl,k −tl,k )2
and (dl,k−1 −
tl,k−1) are the errors, and 0  α ≤ 1 ∈ , 0  βMLM,k ∈ ,
and μl are the upper bounds of the uncertainties μl,k ,
|μl,k|  μl.
Proof: From (40), the weights are written as

ji

p2
ji,k+1 =

ji

p2
ji,k − 2βMLM,k
∂ Ek
∂pji,k

ji

pji,k
+ β2
MLM,k

∂ Ek
∂pji,k
2

ji

q2
lj,k+1 =

ji

q2
lj,k − 2βMLM,k
∂ Ek
∂qlj,k

j

qlj,k
+ β2
MLM,k

∂ Ek
∂qlj,k
2
. (53)
Adding

ji 
p2
ji,k+1 with

ji
q2
lj,k+1 of (53) gives

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
− 2βMLM,k
∂ Ek
∂pji,k

ji

pji,k + β2
MLM,k

∂ Ek
∂pji,k
2
− 2βMLM,k
∂ Ek
∂qlj,k

j

qlj,k + β2
MLM,k

∂ Ek
∂qlj,k
2
. (54)
Equation (54) is represented as

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
− 2βMLM,k
⎡
⎣ ∂ Ek
∂pji,k

ji

pji,k +
∂ Ek
∂qlj,k

j

qlj,k
⎤
⎦
+ β2
MLM,k

∂ Ek
∂pji,k
2
+

∂ Ek
∂qlj,k
2

. (55)
β2
MLM,k

dl,k − tl,k
2 
βA,k
2
+

βB,k
2

= βMLM,k

βA,k
2
+

βB,k
2

βMLM,k

dl,k − tl,k
2
=
⎛
⎝

βA,k
2
+

βB,k
2
 
α+

βC,k
2

α+

βD,k
2

α+

βA,k
2
+

βB,k
2
 
α+

βC,k
2

α+

βD,k
2

+

βE,k
2
 ∗ βMLM,k

dl,k − tl,k
2
⎞
⎠
≤ βMLM,k

dl,k − tl,k
2
. (45)
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Substituting (∂ Ek/∂pji,k)

ji 
pji,k + (∂ Ek/∂qlj,k)

j
qlj,k =
(dl,k − tl,k )[(dl,k − tl,k ) − μl,k ] of (43) in the second element
of (55) gives

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
− 2βMLM,k

dl,k − tl,k

dl,k − tl,k

− μl,k

+ β2
MLM,k

∂ Ek
∂pji,k
2
+

∂ Ek
∂qlj,k
2


ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
− 2βMLM,k

dl,k − tl,k
2
+ 2βMLM,k

dl,k − tl,k

μl,k
+ β2
MLM,k

dl,k − tl,k
2 
βA,k
2
+

βB,k
2

(56)
where μl,k are the uncertainties, βA,k =

ji ((∂ Ek/∂pji,k)/
(dl,k − tl,k )), and βB,k =

j ((∂ Ek/∂qlj,k)/(dl,k − tl,k)).
Substituting 2βMLM,k(dl,k − tl,k )μl,k ≤ (1/2)βMLM,k(dl,k −
tl,k )2
+ 2βMLM,k μ2
l,k into the third element of (56) and
β2
MLM,k(dl,k − tl,k)2
[(βA,k)2
+ (βB,k)2
] ≤ βMLM,k (dl,k − tl,k )2
of (45) into the last element of (56) give

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
− 2βMLM,k

dl,k − tl,k
2
+
1
2
βMLM,k

dl,k − tl,k
2
+ 2βMLM,kμ2
l,k + βMLM,k

dl,k − tl,k
2

ji

p2
ji,k+1 +

ji

q2
lj,k+1
=

ji

p2
ji,k +

ji

q2
lj,k
−
1
2
βMLM,k

dl,k − tl,k
2
+ 2βMLM,kμ2
l,k . (57)
From (47), βMLM,k ≤ (1/α), and using |μl,k | ≤ μl in (57)
gives

ji

p2
ji,k+1 +

ji

q2
lj,k+1 =

ji

p2
ji,k +

ji

q2
lj,k
−
1
2
βMLM,k

dl,k − tl,k
2
+
2
α
μ2
l . (58)
Taking into account o2
l,k+1 = (1/2)βMLM,k(dl,k − tl,k )2
is
o2
l,k+1 ≥
2
α4
μ2
l ⇒

ji

p2
ji,k+1 +

ji

q2
lj,k+1 ≤

ji

p2
ji,k +

ji

q2
lj,k.
(59)
Taking into account that o2
l,k+1 ≥ (2/α)μ2
l for k ∈ [1, k] is
true, hence

ji

p2
ji,k+1 +

ji

q2
lj,k+1 ≤

ji

p2
ji,k +

ji

q2
lj,k
≤ · · · ≤

ji

p2
ji,1 +

ji

q2
lj,1. (60)
Then, (52) is proven.
Remark 8: The result of Theorem 2 that the weights of
the modified Levenberg–Marquardt algorithm are bounded
produces that the hidden layer weights pji,k and output layer
weights qlj,k of the modified Levenberg–Marquardt algorithm
for the artificial neural network learning remain bounded
during all the training and testing.
V. RESULTS
In this section, we compare the Newton algorithm (N) of (1),
(3), (18), (19), and [1] and [2], the Levenberg–Marquardt
algorithm (LM) of (1), (3), (23), (24), and [8]–[11], and the
modified Levenberg–Marquardt algorithm (MLM) of (1), (3),
(35), and (36) for the artificial neural network learning of
electric signal data set because they are based on the Hessian,
and we compare the stable gradient algorithm in a neural
network (SGNN) of [31] and [32], the stable gradient algo-
rithm in a radial basis function neural network (SGRBFNN)
of [33], [34], and the modified Levenberg–Marquardt algo-
rithm (MLM) of (1), (3), (35), and (36) for the artificial
neural network learning of brain signal data set because they
are based on the stability. The objective of N, LM, SGNN,
SGRBFNN, and MLM is that the artificial neural network
outputs dl,k must follow the data set targets tl,k as near as
possible.
In this part of this article, the abovementioned algorithms
are applied for the artificial neural network learning con-
taining the training and testing stages. The root-mean-square
error (RMSE) is utilized to show the performance accuracy
for the comparisons, and it is represented as
E =

1
T
T

k=1
LT

l

dl,k − tl,k
2
1
2
(61)
where dl,k − tl,k are the errors, dl,k are the artificial neural
network outputs, tl,k is the data set targets, LT is the total
outputs number, and T is the final iteration.
A. Electric Signals
The electric signal data set information is obtained from
Electricity Load and Price Forecasting with MATLAB where
the details are explained in [35]. The electric signal data
set is the history of electric energy usage at each hour and
temperature observations of the International Organization for
Standardization (ISO) of Great Britain. The meteorological
information includes the temperature of the dry bulb and the
dew point, taking into account the electric signal data set of
the hourly electric energy usage called an electric signal.
In the electric signal data set, we consider eight inputs
described as follows: a1,k is the temperature of the dry bulb,
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 11
Fig. 3. Training for the first electric signal data set.
a2,k is the dew point, a3,k is the hour of the day, a4,k is the
day of the week, a5,k is a mark indicating if this is a free or a
weekend day, a6,k is the medium load of the past day, a7,k is
the load of the same hour, in the past day, and a8,k is the load
of the same hour and day of the past week, and we consider 1
target described as follows: t1,k is the load of the same day.
In the artificial neural network learning, we consider eight
artificial neural network inputs denoted as a1,k, a2,k, a3,k, a4,k,
a5,k, a6,k, a7,k, and a8,k that are the same inputs of the electric
signal data set, and we consider one artificial neural network
output denoted as d1,k. We utilize 7000 iterations of the data
set for the artificial neural network training, and we utilize
1000 iterations of the data set for the artificial neural network
testing. The objective of N, LM, and MLM is that the artificial
neural network output d1,k must follow the target t1,k as near
as possible.
The N of [1] and [2] is detailed as (1), (3), (18), and (19)
with eight inputs, one output, and five neurons in the hidden
layer, α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a
random number between 0 and 1.
The LM of [8]–[11] is detailed as (1), (3), (23), and (24)
with eight inputs, one output, and five neurons in the hidden
layer, α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a
random number between 0 and 1.
The MLM is detailed as (1), (3), (35), and (36), with
eight inputs, one output, and five neurons in the hidden layer,
α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a random
number between 0 and 1.
The comparisons for the training and testing of the N, LM,
and MLM for the first electric signal data set are shown in
Figs. 3 and 4. The weights of the MLM for the first electric
signal data set are shown in Figs. 5 and 6. The comparisons
for the training and testing of the N, LM, and MLM for the
second electric signal data set are shown in Figs. 7 and 8. The
weights of the MLM for the second electric signal data set
are shown in Figs. 9 and 10. The training and testing RMSE
comparisons of the performance accuracy (61) for the first
electric signal data set are shown in Table I and, for the second
electric signal data set, are shown in Table II. Please note that
the most important data are related to the output d1,k.
To improve the training and testing, more neurons in the hid-
den layer could be included; nevertheless, this decision could
increase the computational cost. From Figs. 3, 4, 7, and 8,
Fig. 4. Testing for the first electric signal data set.
Fig. 5. Hidden layer weights for the first electric signal data set.
Fig. 6. Output layer weights for the first electric signal data set.
TABLE I
RMSE FOR THE FIRST ELECTRIC SIGNAL DATA SET
it is observed that the MLM improves the LM and N because
the signal of the MLM follows better the electric signal data set
than the other. From Figs. 5, 6, 9, and 10, it is observed that the
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Fig. 7. Training for the second electric signal data set.
Fig. 8. Testing for the second electric signal data set.
Fig. 9. Hidden layer weights for the second electric signal data set.
TABLE II
RMSE FOR THE SECOND ELECTRIC SIGNAL DATA SET
weights of the MLM remain bounded. From Tables I and II,
it is observed that the MLM achieves better performance
accuracy for training and testing compared with LM and N
Fig. 10. Output layer weights for the second electric signal data set.
because the RMSE is the smallest for the MLM. Thus, MLM
is the best option for learning in the electric signal data set.
B. Brain Signals
The brain signal data set information is obtained from our
laboratory where the details are explained in [36]. The brain
signal data set is the real data of brain signals. The alpha signal
is obtained in this study because it has more probabilities to
be found. The acquisition system is applied with a 28-year old
healthy man when his eyes are closed. There are four different
signals received by the brain signals.
In the brain signal data set, we consider three inputs
described as follows: a1,k is the brain signal of the focal
point 1, a2,k is the brain signal of the focal point 2, and a3,k
is the brain signal of the focal point 3, and we consider 1
target described as follows: t1,k is the brain signal of the focal
point 4.
In the artificial neural network learning, we consider three
artificial neural network inputs denoted as a1,k, a2,k, and a3,k
that are the same inputs of the brain signal data set, and we
consider one artificial neural network output denoted as d1,k.
We utilize 7000 iterations of the data set for the artificial
neural network training, and we utilize 1000 iterations of the
data set for the artificial neural network testing. The objective
of SGNN, SGRBFNN, and MLM is that the artificial neural
network output d1,k must follow the target t1,k as near as
possible.
The SGNN of [31] and [32] is detailed with three inputs,
one output, and five neurons in the hidden layer, α = 0.9,
pji,1 = rand, qlj,1 = rand, and rand is a random number
between 0 and 1.
The SGRBFNN of [33] and [34] is detailed with three
inputs, one output, and five neurons in the hidden layer,
α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a random
number between 0 and 1.
The MLM is detailed as (1), (3), (35), and (36) with three
inputs, one output, and five neurons in the hidden layer,
α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a random
number between 0 and 1.
The comparisons for the training and testing of the SGNN,
SGRBFNN, and MLM for the first brain signal data set are
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 13
Fig. 11. Training for the first brain signal data set.
Fig. 12. Testing for the first brain signal data set.
Fig. 13. Hidden layer weights for the first brain signal data set.
shown in Figs. 11 and 12. The weights of the MLM for
the first brain signal data set are shown in Figs. 13 and 14.
The comparisons for the training and testing of the SGNN,
SGRBFNN, and MLM for the second brain signal data set
in Figs. 15 and 16. The weights of the MLM for the second
brain signal data set in Figs. 17 and 18. The training and
testing RMSE comparisons of the performance accuracy (61)
for the first brain signal data set are shown in Table III and, for
the second brain signal data set, are shown in Table IV. Please
note that the most important data are related to the output d1,k.
To improve the training and testing, more neurons in the hid-
den layer could be included; nevertheless, this decision could
increase the computational cost. From Figs. 11, 12, 15, and 16,
Fig. 14. Output layer weights for the first brain signal data set.
Fig. 15. Training for the second brain signal data set.
TABLE III
RMSE FOR THE FIRST BRAIN SIGNAL DATA SET
Fig. 16. Testing for the second brain signal data set.
it is observed that the MLM improves the SGRBFNN and
SGNN because the signal of the MLM follows better the brain
signal data set than the other. From Figs. 13, 14, 17, and 18,
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
14 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Fig. 17. Hidden layer weights for the second brain signal data set.
Fig. 18. Output layer weights for the second brain signal data set.
TABLE IV
RMSE FOR THE SECOND BRAIN SIGNAL DATA SET
it is observed that the weights of the MLM remain bounded.
From Table IV, it is observed that the MLM achieves better
performance accuracy for training and testing compared with
SGRBFNN and SGNN because the RMSE is the smallest for
the MLM. Thus, the MLM is the best option for learning in
the brain signal data set.
Remark 9: The result of Theorem 1 that the error of the
MLM is assured to be stable, while the error some of the N,
LM, SGNN, and SGRBFNN are not assured to be stable can
be observed mainly in the training of Figs. 3, 7, 11, and 15 and
in the testing of Figs. 4, 8, 12, and 16, where the signals of
the N, LM, and SGNN are unbounded during the training or
testing, while the signal of the MLM remains bounded during
all the training and testing.
Remark 10: The result of Theorem 2 that the weights of
the MLM are bounded can be observed mainly in the hidden
layer weights of Figs. 5, 9, 13, and 17 and in the output layer
weights of Figs. 6, 10, 14, and 18, where the weights of the
MLM remain bounded during all the training. The weights of
the MLM also remain bounded during all the testing because
they take the last value obtained during the training.
VI. CONCLUSION
The objective of this article is to introduce an algorithm
called modified Levenberg–Marquardt for the artificial neural
network learning. The modified Levenberg–Marquardt was
compared with the Newton, Levenberg–Marquardt, and stable
gradient algorithms for learning of the electric and brain signal
data set, resulting in that we obtained the best performance
accuracy with the modified Levenberg–Marquardt because we
obtained the nearest following of the artificial neural network
output to the data set target and because we obtained the
smallest value in the RMSE. In the forthcoming work, we will
propose other algorithms for the artificial neural network
learning to compare with our results, or we will apply our
algorithm for the learning of other robotic or mechatronic
systems.
ACKNOWLEDGMENT
The author is grateful for the Editor-in-Chief, Associate Edi-
tor, and Reviewers for their valuable comments and insightful
suggestions that helped to improve this research significantly.
He would also like to thank the Instituto Politécnico Nacional,
the Secretaría de Investigación y Posgrado, the Comisión de
Operación y Fomento de Actividades Académicas, and the
Consejo Nacional de Ciencia y Tecnología for their help in
this research.
REFERENCES
[1] S. Kostić and D. Vasović, “Prediction model for compressive strength of
basic concrete mixture using artificial neural networks,” Neural Comput.
Appl., vol. 26, no. 5, pp. 1005–1024, Jul. 2015.
[2] B. Sahoo and P. K. Bhaskaran, “Prediction of storm surge and inundation
using climatological datasets for the indian coast using soft computing
techniques,” Soft Comput., vol. 23, no. 23, pp. 12363–12383, Dec. 2019.
[3] T.-L. Le, “Intelligent fuzzy controller design for antilock braking sys-
tems,” J. Intell. Fuzzy Syst., vol. 36, no. 4, pp. 3303–3315, Apr. 2019.
[4] C. Yin, S. Wu, S. Zhou, J. Cao, X. Huang, and Y. Cheng, “Design
and stability analysis of multivariate extremum seeking with Newton
method,” J. Franklin Inst., vol. 355, no. 4, pp. 1559–1578, Mar. 2018.
[5] S. Chakia, B. Shanmugarajanb, S. Ghosalc, and G. Padmanabham,
“Application of integrated soft computing techniques for optimisation of
hybrid CO2 laser–MIG welding process,” Appl. Soft Comput., vol. 30,
pp. 365–374, May 2015.
[6] Y. Li, H. Zhang, J. Han, and Q. Sun, “Distributed multi-agent opti-
mization via event-triggered based continuous-time Newton–Raphson
algorithm,” Neurocomputing, vol. 275, pp. 1416–1425, Jan. 2018.
[7] M. S. Salim and A. I. Ahmed, “A quasi-Newton augmented lagrangian
algorithm for constrained optimization problems,” J. Intell. Fuzzy Syst.,
vol. 35, no. 2, pp. 2373–2382, Aug. 2018.
[8] C. Lv et al., “Levenberg–arquardt backpropagation training of multilayer
neural networks for state estimation of a safety-critical cyber-physical
system,” IEEE Trans. Ind. Informat., vol. 14, no. 8, pp. 3436–3446,
Aug. 2018.
[9] M. J. Rana, M. S. Shahriar, and M. Shafiullah, “Levenberg–Marquardt
neural network to estimate UPFC-coordinated PSS parameters to
enhance power system stability,” Neural Comput. Appl., vol. 31,
pp. 1237–1248, Jul. 2019.
[10] A. Sarabakha, N. Imanberdiyev, E. Kayacan, M. A. Khanesar, and
H. Hagras, “Novel Levenberg–Marquardt based learning algorithm for
unmanned aerial vehicles,” Inf. Sci., vol. 417, pp. 361–380, Nov. 2017.
[11] J. S. Smith, B. Wu, and B. M. Wilamowski, “Neural network training
with Levenberg–Marquardt and adaptable weight compression,” IEEE
Trans. Neural Netw. Learn. Syst., vol. 30, no. 2, pp. 580–587, Feb. 2019.
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 15
[12] H. G. Han, Y. Li, Y. N. Guo, and J. F. Qiao, “A soft computing method to
predict sludge volume index based on a recurrent self-organizing neural
network,” Appl. Soft Comput., vol. 38, pp. 477–486, Jan. 2016.
[13] J. Qiao, L. Wang, C. Yang, and K. Gu, “Adaptive Levenberg-Marquardt
algorithm based echo state network for chaotic time series prediction,”
IEEE Access, vol. 6, pp. 10720–10732, 2018.
[14] A. Parsaie, A. H. Haghiabi, M. Saneie, and H. Torabi, “Applica-
tions of soft computing techniques for prediction of energy dissipa-
tion on stepped spillways,” Neural Comput. Appl., vol. 29, no. 12,
pp. 1393–1409, Jun. 2018.
[15] N. Zhang and D. Shetty, “An effective LS-SVM-based approach for
surface roughness prediction in machined surfaces,” Neurocomputing,
vol. 198, pp. 35–39, Jul. 2016.
[16] E. Esme and B. Karlik, “Fuzzy c-means based support vector machines
classifier for perfume recognition,” Appl. Soft Comput., vol. 46,
pp. 452–458, Sep. 2016.
[17] P. Fergus, I. Idowu, A. Hussain, and C. Dobbins, “Advanced artificial
neural network classification for detecting preterm births using EHG
records,” Neurocomputing, vol. 188, pp. 42–49, May 2016.
[18] A. Narang, B. Batra, A. Ahuja, J. Yadav, and N. Pachauri, “Classifica-
tion of EEG signals for epileptic seizures using Levenberg-Marquardt
algorithm based multilayer perceptron neural network,” J. Intell. Fuzzy
Syst., vol. 34, no. 3, pp. 1669–1677, Mar. 2018.
[19] J. Dong, K. Lu, J. Xue, S. Dai, R. Zhai, and W. Pan, “Accelerated non-
rigid image registration using improved Levenberg–Marquardt method,”
Inf. Sci., vol. 423, pp. 66–79, Jan. 2018.
[20] J. Li, W. X. Zheng, J. Gu, and L. Hua, “Parameter estimation algorithms
for Hammerstein output error systems using Levenberg–Marquardt opti-
mization method with varying interval measurements,” J. Franklin Inst.,
vol. 354, pp. 316–331, Jan. 2017.
[21] X. Yang, B. Huang, and H. Gao, “A direct maximum likelihood
optimization approach to identification of LPV time-delay systems,”
J. Franklin Inst., vol. 353, no. 8, pp. 1862–1881, May 2016.
[22] I. S. Baruch, V. A. Quintana, and E. P. Reynaud, “Complex-valued neural
network topology and learning applied for identification and control of
nonlinear systems,” Neurocomputing, vol. 233, pp. 104–115, Apr. 2017.
[23] M. Kaminski and T. Orlowska-Kowalska, “An on-line trained neural
controller with a fuzzy learning rate of the Levenberg–Marquardt
algorithm for speed control of an electrical drive with an elastic joint,”
Appl. Soft Comput., vol. 32, pp. 509–517, Jul. 2015.
[24] S. Roshan, Y. Miche, A. Akusok, and A. Lendasse, “Adaptive and online
network intrusion detection system using clustering and extreme learning
machines,” J. Franklin Inst., vol. 355, no. 4, pp. 1752–1779, Mar. 2018.
[25] C. Bishop, “Exact calculation of the hessian matrix for the multilayer
perceptron,” Neural Comput., vol. 4, no. 4, pp. 494–501, Jul. 1992.
[26] C. M. Bishop, “A fast procedure for retraining the multilayer percep-
tron,” Int. J. Neural Syst., vol. 2, no. 3, pp. 229–236, 1991.
[27] C. M. Bishop, “Curvature-driven smoothing in feedforward networks,”
in Proc. Seattle Int. Joint Conf. Neural Netw. (IJCNN), 1990, p. 749.
[28] G. Cybenko, “Approximation by superpositions of a sigmoidal function,”
Math. Control, Signals, Syst., vol. 2, no. 4, pp. 303–314, Dec. 1989.
[29] R. B. Ash, Real Analysis and Probability. New York, NY, USA:
Academic, 1972.
[30] J. S. R. Jang and C. T. Sun, Neuro-Fuzzy and Soft Computing. Upper
Saddle River, NJ, USA: Prentice-Hall, 1996.
[31] J. de Jesús Rubio, P. Angelov, and J. Pacheco, “Uniformly stable
backpropagation algorithm to train a feedforward neural network,” IEEE
Trans. Neural Netw., vol. 22, no. 3, pp. 356–366, Mar. 2011.
[32] W. Yu and X. Li, “Discrete-time neuro identification without robust mod-
ification,” IEE Proc.-Control Theory Appl., vol. 150, no. 3, pp. 311–316,
May 2003.
[33] J. D. J. Rubio, I. Elias, D. R. Cruz, and J. Pacheco, “Uniform stable
radial basis function neural network for the prediction in two mecha-
tronic processes,” Neurocomputing, vol. 227, pp. 122–130, Mar. 2017.
[34] J. D. J. Rubio, “USNFIS: Uniform stable neuro fuzzy inference system,”
Neurocomputing, vol. 262, pp. 57–66, Nov. 2017.
[35] I. Elias et al., “Genetic algorithm with radial basis mapping network
for the electricity consumption modeling,” Appl. Sci., vol. 10, no. 12,
p. 4239, Jun. 2020.
[36] J. D. J. Rubio, D. M. Vázquez, and D. Mújica-Vargas, “Acquisition
system and approximation of brain signals,” IET Sci., Meas. Technol.,
vol. 7, no. 4, pp. 232–239, Jul. 2013.
José de Jesús Rubio (Member, IEEE) is currently
a full-time Professor with the Sección de Estudios
de Posgrado e Investigación, ESIME Azcapotzalco,
Instituto Politécnico Nacional, Ciudad de México,
Mexico. He has published over 142 international
journal articles with 2214 cites from Scopus. He has
been the tutor of four Ph.D. students, 20 Ph.D.
students, 42 M.S. students, 4 S. students, and 17 B.S.
students.
Dr. Rubio was a Guest Editor of Neurocomputing,
Applied Soft Computing, Sensors, The Journal of
Supercomputing, Computational Intelligence and Neuroscience, Frontiers in
Psychology, and the Journal of Real-Time Image Processing. He also serves as
an Associate Editor for the IEEE TRANSACTIONS ON NEURAL NETWORKS
AND LEARNING SYSTEMS, the IEEE TRANSACTIONS ON FUZZY SYSTEMS,
Neural Computing and Applications, Frontiers in Neurorobotics, and Mathe-
matical Problems in Engineering.
Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.

More Related Content

Similar to 10.1109@TNNLS.2020.3015200.pdf

Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Alexander Decker
 
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTORARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTORijac123
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptxPandi Gingee
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithmsaciijournal
 
Artificial neural networks seminar presentation using MSWord.
Artificial neural networks seminar presentation using MSWord.Artificial neural networks seminar presentation using MSWord.
Artificial neural networks seminar presentation using MSWord.Mohd Faiz
 
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...ijceronline
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalizationKamal Bhatt
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachEditor Jacotech
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachEditor Jacotech
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural NetworksIRJET Journal
 
Neural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlabNeural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlabijcses
 
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...Cemal Ardil
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionIJCSIS Research Publications
 
Artificial Neural Network Seminar Report
Artificial Neural Network Seminar ReportArtificial Neural Network Seminar Report
Artificial Neural Network Seminar ReportTodd Turner
 
Artificial Neural Networks.pdf
Artificial Neural Networks.pdfArtificial Neural Networks.pdf
Artificial Neural Networks.pdfBria Davis
 
Live to learn: learning rules-based artificial neural network
Live to learn: learning rules-based artificial neural networkLive to learn: learning rules-based artificial neural network
Live to learn: learning rules-based artificial neural networknooriasukmaningtyas
 

Similar to 10.1109@TNNLS.2020.3015200.pdf (20)

Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
 
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTORARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
 
24csit38.pdf
24csit38.pdf24csit38.pdf
24csit38.pdf
 
Deep learning notes.pptx
Deep learning notes.pptxDeep learning notes.pptx
Deep learning notes.pptx
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
 
Artificial neural networks seminar presentation using MSWord.
Artificial neural networks seminar presentation using MSWord.Artificial neural networks seminar presentation using MSWord.
Artificial neural networks seminar presentation using MSWord.
 
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approach
 
Design of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approachDesign of airfoil using backpropagation training with mixed approach
Design of airfoil using backpropagation training with mixed approach
 
Ijciet 10 01_153-2
Ijciet 10 01_153-2Ijciet 10 01_153-2
Ijciet 10 01_153-2
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
 
Neural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlabNeural network based numerical digits recognization using nnt in matlab
Neural network based numerical digits recognization using nnt in matlab
 
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
 
teste
testeteste
teste
 
Artificial Neural Network Seminar Report
Artificial Neural Network Seminar ReportArtificial Neural Network Seminar Report
Artificial Neural Network Seminar Report
 
APPLICATIONS OF WAVELET TRANSFORMS AND NEURAL NETWORKS IN EARTHQUAKE GEOTECHN...
APPLICATIONS OF WAVELET TRANSFORMS AND NEURAL NETWORKS IN EARTHQUAKE GEOTECHN...APPLICATIONS OF WAVELET TRANSFORMS AND NEURAL NETWORKS IN EARTHQUAKE GEOTECHN...
APPLICATIONS OF WAVELET TRANSFORMS AND NEURAL NETWORKS IN EARTHQUAKE GEOTECHN...
 
Artificial Neural Networks.pdf
Artificial Neural Networks.pdfArtificial Neural Networks.pdf
Artificial Neural Networks.pdf
 
Live to learn: learning rules-based artificial neural network
Live to learn: learning rules-based artificial neural networkLive to learn: learning rules-based artificial neural network
Live to learn: learning rules-based artificial neural network
 

Recently uploaded

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 

Recently uploaded (20)

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 

10.1109@TNNLS.2020.3015200.pdf

  • 1. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Stability Analysis of the Modified Levenberg–Marquardt Algorithm for the Artificial Neural Network Training José de Jesús Rubio , Member, IEEE Abstract—The Levenberg–Marquardt and Newton are two algorithms that use the Hessian for the artificial neural network learning. In this article, we propose a modified Levenberg– Marquardt algorithm for the artificial neural network learning containing the training and testing stages. The modified Levenberg–Marquardt algorithm is based on the Levenberg– Marquardt and Newton algorithms but with the following two differences to assure the error stability and weights boundedness: 1) there is a singularity point in the learning rates of the Levenberg–Marquardt and Newton algorithms, while there is not a singularity point in the learning rate of the modified Levenberg– Marquardt algorithm and 2) the Levenberg–Marquardt and Newton algorithms have three different learning rates, while the modified Levenberg–Marquardt algorithm only has one learning rate. The error stability and weights boundedness of the modi- fied Levenberg–Marquardt algorithm are assured based on the Lyapunov technique. We compare the artificial neural network learning with the modified Levenberg–Marquardt, Levenberg– Marquardt, Newton, and stable gradient algorithms for the learning of the electric and brain signals data set. Index Terms—Error stability, Levenberg–Marquardt, Newton, weights boundedness. I. INTRODUCTION THE second-order partial derivatives of the cost function with respect to the weights are known as the Hessian. The Hessian of a convex function is a positive semidefinite. If the Hessian is positive definite at a point, then the convex function attains a minimum at that point. This property of the Hessian makes it an attractive alternative for artificial neural network learning. The Levenberg–Marquardt and Newton are two algorithms that use the Hessian for the artificial neural network learning containing the training and testing stages. There are some interesting applications of the Levenberg– Marquardt and Newton algorithms. In [1] and [2], the Newton algorithm is used for learning. In [3] and [4], the New- ton algorithm is utilized for the control. In [5]–[7], the Manuscript received November 29, 2019; revised April 7, 2020; accepted August 5, 2020. The author is with the Sección de Estudios de Posgrado e Investigación, Esime Azcapotzalco, Instituto Politécnico Nacional, Ciudad de México 02250, Mexico (e-mail: rubio.josedejesus@gmail.com). Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2020.3015200 Newton algorithm is considered for the optimization. In [8]–[11], the Levenberg–Marquardt algorithm is used for the learning. In [12]–[15], the Levenberg–Marquardt algorithm is considered for the prediction. In [16]–[18], the Levenberg– Marquardt algorithm is utilized for the classification. In [19]–[21], the Levenberg–Marquardt algorithm is used for the optimization. In [22] and [23], the Levenberg–Marquardt algorithm is considered for the control. In [24], the Levenberg– Marquardt algorithm is considered for the detection. Since the Levenberg–Marquardt and Newton algorithms have been con- sidered in several applications, they could be good alternatives for the artificial neural network learning. If the Hessian is positive definite at a point, then a con- vex function attains a minimum at that point, but the point must be a singular point [25]–[27]. In this article, we study this problem presented in Levenberg–Marquardt and Newton algorithms that use the Hessian for the artificial neural net- work learning by the following steps: 1) we represent the Levenberg–Marquardt and Newton algorithms in the scalar form and 2) we show that the Levenberg–Marquardt and Newton algorithms in the scalar form contain the main terms denoted as the learning rates. In the Levenberg–Marquardt and Newton algorithms, a value of zero in their determinants is a singularity point in their learning rates. It results that the Levenberg–Marquardt or Newton algorithms errors are not assured to be stable. It should be interesting to find a way to modify one of the Levenberg–Marquardt or Newton algorithms to make its error stable. In this article, we propose the modified Levenberg– Marquardt algorithm for the artificial neural network learning. The modified Levenberg–Marquardt algorithm is based on the Levenberg–Marquardt and Newton algorithms but with the following two differences to assure the error stability and weights boundedness: 1) there is a singularity point in the learning rates of the Levenberg–Marquardt and Newton algo- rithms, while there is not a singularity point in the learning rate of the modified Levenberg–Marquardt algorithm; therefore, the learning rate in the modified Levenberg–Marquardt algo- rithm obtains bounded values and 2) the Levenberg–Marquardt and Newton algorithms have three different learning rates, while the modified Levenberg–Marquardt algorithm only has one learning rate. It results that the error stability and weights 2162-237X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 2. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS boundedness of the modified Levenberg–Marquardt algorithm can be assured based on the Lyapunov technique; therefore, the artificial neural network outputs and weights of the mod- ified Levenberg–Marquardt algorithm remain bounded during all the training and testing. In [25]–[27], there is an interesting procedure to compute the Levenberg–Marquardt and Newton algorithms for an arti- ficial neural network with multiple hidden layers that are useful in the deep learning. Different to the abovementioned work, this article computes the modified Levenberg–Marquardt algorithm for an artificial neural network with a single hidden layer because of the following four reasons: 1) we show that the two-hidden-layer Levenberg–Marquardt and Newton algorithms are worse than the Levenberg–Marquardt and New- ton algorithms because the Levenberg–Marquardt and Newton algorithms present one singularity point, while the two-hidden- layer Levenberg–Marquardt and Newton algorithms present three singularity points; 2) there is a computational concern that computing the inverse of the Levenberg–Marquardt and Newton algorithms for an artificial neural network with mul- tiple hidden layers would be very expensive; 3) in [28]–[30], they show based on the Stone–Weierstrass theorem that the targets can be arbitrarily well approximated by an artificial neural network with a single hidden layer and a hyperbolic tangent function; and 4) this article is mainly focused in assuring the stability of the modified Levenberg–Marquardt algorithm for an artificial neural network with a single hidden layer. Finally, we compare the artificial neural network learn- ing with the modified Levenberg–Marquardt, the Levenberg– Marquardt algorithm [8]–[11], the Newton algorithm [1], [2], the stable gradient algorithm in a neural network [31], [32], and the stable gradient algorithm in a radial basis function neural network [33], [34] for the learning of the electric and brain signals data set. The electric signal data set information is obtained from electricity load and price forecasting with MATLAB where the details are explained in [35]. The brain signal data set information is obtained from our laboratory where the details are explained in [36]. The remainder of this article is organized as follows. Section II presents the Levenberg–Marquardt and Newton algorithms for artificial neural network learning. Section III discusses the two-hidden-layer Levenberg–Marquardt and Newton algorithms for the two-hidden-layer artificial neural network learning. Section IV introduces the modified Levenberg–Marquardt for the artificial neural network learn- ing, and the error stability and weights boundedness are assured. Section V shows the comparison results of several algorithms for the learning of the electric and brain signals data set. In Section VI, conclusions and forthcoming work are detailed. II. LEVENBERG–MARQUARDT AND NEWTON ALGORITHMS FOR THE ARTIFICIAL NEURAL NETWORK LEARNING The algorithms for the artificial neural network learning frequently evaluate the first derivative of the cost function with Fig. 1. Artificial neural network. respect to the weights. Nevertheless, there are several cases where it is interesting to evaluate the second derivatives of the cost function with respect to the weights. The second-order partial derivatives of the cost function with respect to the weights are known as the Hessian. A. Hessian for the Artificial Neural Network Learning In this article, we use a special artificial neural network with one hidden layer. It could be extended to a general multilayer artificial neural network; nevertheless, this research is focused on a compact artificial neural network. This artificial neural network uses hyperbolic tangent functions in the hidden layer and linear functions in the output layer. We define the artificial neural network as dl,k = j qlj,k g i pji,kai,k (1) where pji,k are the weights of the hidden layer, qlj,k are the weights of the output layer, g(·) are the activation functions, ai,k are the artificial neural network inputs, dl,k are the artificial neural network outputs, i is the input layer, j is the hidden layer, l is the output layer, and k is the iteration. We consider the artificial neural network of Fig. 1. We define pji,k as the weights of the hidden layer and qlj,k as the weights of the output layer. We define the cost function Ek as Ek = 1 2 LT l=1 dl,k − tl,k 2 (2) where dl,k are the artificial neural network outputs, tl,k are the data set targets, and LT is the total outputs number. The second-order partial derivatives of the cost function Ek with respect to the weights pji,k and qlj,k will be used to obtain the Newton and Levenberg–Marquardt algorithms. We consider the forward propagation as z j,k = i pji,kai,k, cj,k = g z j,k xl,k = j qlj,kcj,k, dl,k = f xl,k = xl,k (3) Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 3. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 3 where ai,k are the artificial neural network inputs and dl,k are the artificial neural network outputs, pji,k are hidden layer weights, and qlj,k are output layer weights. We consider the activation functions in the hidden layer as the hyperbolic tangent functions g z j,k = ez j,k − e−z j,k ez j,k + e−z j,k = tanh z j,k . (4) The first and second derivatives of the hyperbolic tangent functions (4) are g/ z j,k = 4 ez j,k + e−z j,k 2 = sec h2 z j,k g// z j,k = −2 ez j,k − e−z j,k ez j,k + e−z j,k 4 ez j,k + e−z j,k 2 = −2 tanh z j,k sec h2 z j,k =−2g z j,k g/ z j,k . (5) We consider the activation functions of the output layer as the linear functions f xl,k = xl,k. (6) The first and second derivatives of the linear functions (6) are f / xl,k = 1, f // xl,k = 0. (7) The first and second derivatives of the cost function (2) are ∂ Ek ∂dl,k = dl,k − tl,k , ∂2 Ek ∂d2 l,k = 1. (8) Using the cost function (2), we obtain the backpropagation of the output layer as ∂ Ek ∂qlj,k = ∂ Ek ∂dl,k ∂dl,k ∂xl,k ∂xl,k ∂qlj,k = dl,k − tl,k ∂ f xl,k ∂xl,k cj,k = dl,k − tl,k ∂xl,k ∂xl,k cj,k = dl,k − tl,k (1)g z j,k = dl,k − tl,k g z j,k (9) where f (xl,k ) = xl,k of (6) and g(z j,k) = tanh(z j,k) of (4). Using the cost function (2), we obtain the backpropagation of the hidden layer as ∂ Ek ∂pji,k = ∂ Ek ∂dl,k ∂dl,k ∂xl,k ∂xl,k ∂cj,k ∂cj,k ∂z j,k ∂z j,k ∂pji,k = dl,k − tl,k (1)qlj,k g/ z j,k ai,k = dl,k − tl,k qlj,k g/ z j,k ai,k (10) where g/ (z j,k) = (∂cj,k/∂z j,k) = (∂g(z j,k)/∂z j,k) = sec h2 (z j,k) of (5). We define the second derivative of Ek as the Hessian Hk [25]–[27] Hk = ∇∇Ek = ⎡ ⎢ ⎢ ⎢ ⎣ ∂2 Ek ∂p2 ji,k ∂2 Ek ∂pji,k∂qlj,k ∂2 Ek ∂pji,k∂qlj,k ∂2 Ek ∂q2 lj,k ⎤ ⎥ ⎥ ⎥ ⎦ (11) where the Hessian is symmetrical ∂2 Ek ∂pji,k∂qjl,k = ∂2 Ek ∂qjl,k∂pji,k . (12) The Hessian elements are ∂2 Ek ∂p2 ji,k = a2 i,kqlj,k g// z j,k σi,k + g/ z j,k 2 qlj,k Si,k ∂2 Ek ∂pji,k∂qlj,k = ai,k g/ z j,k σi,k + cj,kqlj,k Si,k ∂2 Ek ∂q2 lj,k = c2 j,k f // xl,k σi,k + f / xl,k 2 Si,k (13) where Si,k = ∂2 Ek ∂d2 l,k = 1 g/ z j,k = sec h2 z j,k , f / xl,k = 1 g// z j,k = −2 tanh z j,k sec h2 z j,k , f // xl,k = 0 cj,k = ∂xl,k ∂qlj,k = g z j,k , ai,k = ∂z j,k ∂pji,k g z j,k = tanh z j,k , f xl,k = xl,k, σi,k = dl,k − tl,k . We substitute the elements of (13) and (11); then, the Hessian is Hk = ∇∇Ek = ⎡ ⎢ ⎢ ⎢ ⎣ ∂2 Ek ∂p2 ji,k ∂2 Ek ∂pji,k∂qlj,k ∂2 Ek ∂pji,k∂qlj,k ∂2 Ek ∂q2 lj,k ⎤ ⎥ ⎥ ⎥ ⎦ ∂2 Ek ∂p2 ji,k = a2 i,kqlj,k ∗ −2g z j,k g/ z j,k dl,k − tl,k +g/ z j,k 2 qlj,k ∂2 Ek ∂pji,k∂qlj,k = ai,k g/ z j,k dl,k − tl,k + g z j,k qlj,k ∂2 Ek ∂q2 lj,k = g z j,k 2 (14) where ai,k are the artificial neural network inputs, dl,k are the artificial neural network outputs, g(z j,k) = tanh(z j,k) are the activation functions, g/ (z j,k) = sec h2 (z j,k) are the deriva- tives of the activation functions, tl,k are the data set targets, z j,k = pji,kai,k are the hidden layer outputs, qlj,k are the weights of the hidden layer. In the next step, we evaluate the Hessian with the Levenberg–Marquardt and Newton algorithms. B. Newton Algorithm The Newton algorithm constitutes the first alternative to update the weights for the artificial neural network learning. We represent the updating of the Newton algorithm as [1], [2] pji,k+1 qlj,k+1 = pji,k qlj,k − α[Hk]−1 ⎡ ⎢ ⎢ ⎣ ∂ Ek ∂pji,k ∂ Ek ∂qlj,k ⎤ ⎥ ⎥ ⎦ Hk = βC,k βE,k βE,k βD,k βC,k = jil ∂2 Ek ∂p2 ji,k , βD,k = j ∂2 Ek ∂q2 lj,k βE,k = jil ∂2 Ek ∂pji,k∂qlj,k , jil = j i l (15) Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 4. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS where the elements (∂2 Ek/∂p2 ji,k), (∂2 Ek/∂q2 lj,k), and (∂2 Ek/∂pji,k∂qlj,k) are in (14), the elements (∂ Ek/∂qlj,k) and (∂ Ek/∂pji,k) are in (9) and (10), pji,k and qlj,k are the weights, and α is the learning factor. The Newton algorithm requires the existence of the inverse in the Hessian ([Hk]−1 ). Now, we will represent the Newton algorithm of (15) in the scalar form. First, from (15), we obtain the inverse of Hk as [Hk]−1 = βC,k βE,k βE,k βD,k −1 = 1 det[Hk] βD,k −βE,k −βE,k βC,k det[Hk] = βC,k βD,k − βE,k 2 βC,k = jil ∂2 Ek ∂p2 ji,k , βD,k = j ∂2 Ek ∂q2 lj,k βE,k = jil ∂2 Ek ∂pji,k∂qlj,k , jil = j i l . (16) We substitute [Hk]−1 of (16) into (15) as pji,k+1 qlj,k+1 = pji,k qlj,k − α det[Hk] βD,k −βE,k −βE,k βC,k ⎡ ⎢ ⎢ ⎣ ∂ Ek ∂pji,k ∂ Ek ∂qlj,k ⎤ ⎥ ⎥ ⎦ det[Hk] = βC,k βD,k − βE,k 2 . (17) Rewriting (17) in the scalar form is pji,k+1 = pji,k − βN ji,k ∂ Ek ∂pji,k + γN,k ∂ Ek ∂qlj,k qlj,k+1 = qlj,k − βNlj,k ∂ Ek ∂qlj,k + γN,k ∂ Ek ∂pji,k βN ji,k = α βD,k det[Hk] , βNlj,k = α βC,k det[Hk] γN,k = α βE,k det[Hk] det[Hk]N = det[Hk] = βC,k βD,k − βE,k 2 βC,k = jil ∂2 Ek ∂p2 ji,k , βD,k = j ∂2 Ek ∂q2 lj,k βE,k = jil ∂2 Ek ∂pji,k∂qlj,k , jil = j i l (18) where ∂ Ek ∂pji,k = dl,k − tl,k qlj,k g/ z j,k ai,k ∂ Ek ∂qlj,k = g z j,k dl,k − tl,k ∂2 Ek ∂p2 ji,k = a2 i,kqlj,k ∗ −2g z j,k g/ z j,k dl,k − tl,k +g/ z j,k 2 qlj,k ∂2 Ek ∂q2 lj,k = g z j,k 2 ∂2 Ek ∂pji,k∂qlj,k = ai,k g/ z j,k dl,k − tl,k + g z j,k qlj,k . (19) βN ji,k, βNlj,k, and γN,k are the learning rates, pji,k and qlj,k are the weights, α is the learning factor, g(z j,k) = tanh(z j,k) are the activation functions, and g/ (z j,k) = sec h2 (z j,k) are the derivative of the activation functions. Equations (23) and (24) describe the Newton algorithm. Remark 1: In the Newton algorithm of (18) and (19), we can observe that a value of zero in (βC,k)(βD,k) − (βE,k)2 of det[Hk]N is a singularity point in the learning rates βN ji,k, βNlj,k, and γN,k. It results that the Newton algorithm error is not assured to be stable. Hence, it would be interesting to consider other alternative algorithm for the artificial neural network learning. C. Levenberg–Marquardt Algorithm The Levenberg–Marquardt algorithm constitutes the second alternative to update the weights for the artificial neural network learning. We represent the basic updating of the Levenberg–Marquardt algorithm as [8]–[11] pji,k+1 qlj,k+1 = pji,k qlj,k − [Hk + αI]−1 ⎡ ⎢ ⎢ ⎣ ∂ Ek ∂pji,k ∂ Ek ∂qlj,k ⎤ ⎥ ⎥ ⎦ Hk = βC,k βE,k βE,k βD,k βC,k = jil ∂2 Ek ∂p2 ji,k , βD,k = j ∂2 Ek ∂q2 lj,k βE,k = jil ∂2 Ek ∂pji,k∂qlj,k , jil = j i l (20) where the elements (∂2 Ek/∂p2 ji,k), (∂2 Ek/∂q2 lj,k), and (∂2 Ek/∂pji,k∂qlj,k) are in (14), the elements (∂ Ek/∂qlj,k) and (∂ Ek/∂pji,k) are in (9) and (10), pji,k and qlj,k are the weights, and α is the learning factor. The Levenberg– Marquardt algorithm requires the existence of the inverse in the Hessian [Hk + αI]−1 . Now, we will represent the Levenberg–Marquardt algorithm of (20) in the scalar form. First, from (20), we obtain the inverse of Hk + αI as [Hk + αI]−1 = α + βC,k βE,k βE,k α + βD,k −1 = 1 det[Hk + αI] α + βD,k −βE,k −βE,k α + βC,k det[Hk + αI] = α + βC,k α + βD,k − βE,k 2 βC,k = jil ∂2 Ek ∂p2 ji,k , βD,k = j ∂2 Ek ∂q2 lj,k βE,k = jil ∂2 Ek ∂pji,k∂qlj,k , jil = j i l . (21) Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 5. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 5 We substitute [Hk + αI]−1 into (20) as pji,k+1 qlj,k+1 = pji,k qlj,k − α det[Hk +αI] α+βD,k −βE,k −βE,k α+βC,k ⎡ ⎢ ⎢ ⎣ ∂ Ek ∂pji,k ∂ Ek ∂qlj,k ⎤ ⎥ ⎥ ⎦ det[Hk +αI] = α + βC,k α + βD,k − βE,k 2 . (22) Rewriting (22) in the scalar form is pji,k+1 = pji,k − βLMji,k ∂ Ek ∂pji,k + γLM,k ∂ Ek ∂qlj,k qlj,k+1 = qlj,k − βLMlj,k ∂ Ek ∂qlj,k + γLM,k ∂ Ek ∂pji,k βLMji,k = α + βD,k det[Hk + αI] , βLMlj,k = α + βC,k det[Hk + αI] γLM,k = βE,k det[Hk + αI] det[Hk]LM = det[Hk + αI] = α + βC,k α + βD,k − βE,k 2 βC,k = jil ∂2 Ek ∂p2 ji,k , βD,k = j ∂2 Ek ∂q2 lj,k βE,k = jil ∂2 Ek ∂pji,k∂qlj,k , jil = j i l (23) where ∂ Ek ∂pji,k = dl,k − tl,k qlj,k g/ z j,k ai,k ∂ Ek ∂qlj,k = g z j,k dl,k − tl,k ∂2 Ek ∂p2 ji,k = a2 i,kqlj,k ∗ −2g z j,k g/ z j,k dl,k − tl,k + g/ z j,k 2 qlj,k ∂2 Ek ∂q2 lj,k = g z j,k 2 ∂2 Ek ∂pji,k∂qlj,k = ai,k g/ z j,k dl,k − tl,k + g z j,k qlj,k . (24) βLMji,k, βLMlj,k , and γLM,k are the learning rates, pji,k and qlj,k are the weights, α is the learning factor, g(z j,k) = tanh(z j,k) are the activation functions, and g/ (z j,k) = sec h2 (z j,k) are the derivative of the activation functions. Equations (23) and (24) describe the Levenberg–Marquardt algorithm. Remark 2: In the Levenberg–Marquardt algorithm of (23) and (24), we can observe that a value of zero in (α + (βC,k))(α+(βD,k ))−(βE,k)2 of det[Hk]LM is a singularity point in the learning rates βLMji,k, βLMlj,k, and γLM,k. It results that the Levenberg–Marquardt algorithm error is not assured to be stable. Hence, it should be interesting to find a way to modify the Levenberg–Marquardt algorithm to make its error stable. Fig. 2. Two-hidden-layer artificial neural network. III. TWO-HIDDEN-LAYER LEVENBERG–MARQUARDT AND NEWTON ALGORITHMS FOR THE ARTIFICIAL NEURAL NETWORK LEARNING In this section, the two-hidden-layer Levenberg–Marquardt and Newton algorithms are presented as a comparison with the Levenberg–Marquardt and Newton algorithms for the artificial neural network learning. A. Two-Hidden-Layer Hessian for the Artificial Neural Network Learning In this article, we use a two-hidden-layer artificial neural network. This artificial neural network uses hyperbolic tangent functions in the hidden layer and linear functions in the output layer. We define the two-hidden-layer artificial neural network as dl,k = j qlj,k g i pji,k g r uir,k vr,k (25) where pji,k and uir,k are the weights of the two hidden layers, qlj,k are the weights of the output layer, g(·) are the activation functions, vr,k are the artificial neural network inputs, dl,k are the artificial neural network outputs, r is the input layer, j and i are the hidden layers, l is the output layer, and k is the iteration. We consider the two-hidden-layer artificial neural network shown in Fig. 2. We define pji,k and uir,k as the weights of the hidden layer and qlj,k as the weights of the output layer. We define the cost function Ek as Ek = 1 2 LT l=1 dl,k − tl,k 2 (26) where dl,k is the artificial neural network output, tl,k is the data set target, and LT is the total outputs number. The second-order partial derivatives of the cost function Ek with respect to the weights pji,k, uir,k , and qlj,k will be used to obtain the two-hidden-layer Newton and Levenberg– Marquardt algorithms. We consider the forward propagation as wj,k = i uir,k vr,k , ai,k = g wi,k z j,k = i pji,kai,k, cj,k = g z j,k xl,k = j qlj,kcj,k, dl,k = f xl,k = xl,k (27) Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 6. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS where ai,k are the artificial neural network inputs and dl,k are the artificial neural network outputs, pji,k and uir,k are hidden layer weights, and qlj,k are output layer weights. We consider the activation functions in the two hidden layers as the hyperbolic tangent functions g wi,k = ewi,k − e−wi,k ewi,k + e−wi,k = tanh wi,k g z j,k = ez j,k − e−z j,k ez j,k + e−z j,k = tanh z j,k . (28) We consider the activation functions of the output layer as the linear functions f xl,k = xl,k. (29) We define the second derivative of Ek as the two-hidden- layer Hessian Hk [25]–[27] Hk = ∇∇Ek = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ∂2 E ∂p2 ji ∂2 E ∂pji ∂qlj ∂2 E ∂pji ∂uir ∂2 E ∂pji∂qlj ∂2 E ∂q2 lj ∂2 E ∂qlj ∂uir ∂2 E ∂pji∂uir ∂2 E ∂qlj ∂uir ∂2 E ∂u2 ir ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ . (30) In the next step, we evaluate the two-hidden-layer Hessian with the two-hidden-layer Levenberg–Marquardt and Newton algorithms. B. Two-Hidden-Layer Newton Algorithm The two-hidden-layer Newton algorithm constitutes one alternative to update the weights for the two-hidden-layer artificial neural network learning. We represent the updating of the two-hidden-layer Newton algorithm as [1], [2] ⎡ ⎣ pji,k+1 qlj,k+1 uir,k+1 ⎤ ⎦ = ⎡ ⎣ pji,k qlj,k uir,k ⎤ ⎦ − α[Hk]−1 ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ∂ Ek ∂pji,k ∂ Ek ∂qlj,k ∂ Ek ∂uir,k ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ Hk = ⎡ ⎣ βC,k βE,k βG,k βE,k βD,k βL,k βG,k βL,k βF,k ⎤ ⎦, ir = i r βC,k = jil ∂2 Ek ∂p2 ji,k , βD,k = j ∂2 Ek ∂q2 lj,k βE,k = jil ∂2 Ek ∂pji,k∂qlj,k , jil = j i l βF,k = ir ∂2 E ∂u2 ir , βG,k = jir ∂2 E ∂pji∂uir βL,k = jir ∂2 E ∂qlj ∂uir , jir = j i r (31) where pji,k, uir , and qlj,k are the weights and α is the learning factor. The two-hidden-layer Newton algorithm requires the existence of the inverse in the Hessian ([Hk]−1 ). From (31), we obtain the inverse of Hk as [Hk]−1 = ⎡ ⎣ βC,k βE,k βG,k βE,k βD,k βL,k βG,k βL,k βF,k ⎤ ⎦ −1 = 1 det[Hk] ⎡ ⎢ ⎣ βD,k βF,k − βL,k 2 − βE,k βF,k + βL,k βG,k βE,k βL,k − βD,k βG,k − βE,k βF,k + βL,k βG,k βC,k βF,k − βG,k 2 − βC,k βL,k + βG,k βE,k βE,k βL,k − βD,k βG,k − βC,k βL,k + βG,k βE,k βC,k βD,k − βE,k 2 ⎤ ⎥ ⎦ det[Hk]N = det[Hk] = βC,k βD,k βF,k − βL,k 2 − βE,k βE,k βF,k − βL,k βG,k + βG,k βE,k βL,k − βD,k βG,k . (32) Remark 3: In the two-hidden-layer Newton algorithm of (32) and (31), we can observe that values of zero in (βD,k )(βF,k) − (βL,k)2 , (βE,k)(βF,k) − (βL,k)(βG,k), and (βE,k)(βL,k) − (βD,k )(βG,k) of det[Hk]N are three singularity points in the learning rates βN ji,k, βNlj,k , and γN,k. The two- hidden-layer Newton algorithm of (32) and (31) is worse than the Newton algorithm of (18) and (19) because the Newton algorithm of (18) and (19) presents one singularity point, while the two-hidden-layer Newton algorithm of (32) and (31) presents three singularity points. C. Two-Hidden-Layer Levenberg–Marquardt Algorithm The two-hidden-layer Levenberg–Marquardt algorithm constitutes one alternative to update the weights for the two-hidden-layer artificial neural network learning. We rep- resent the basic updating of the two-hidden-layer Levenberg– Marquardt algorithm as [8]–[11] ⎡ ⎣ pji,k+1 qlj,k+1 uir,k+1 ⎤ ⎦ = ⎡ ⎣ pji,k qlj,k uir,k ⎤ ⎦ − [Hk + αI]−1 ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ∂ Ek ∂pji,k ∂ Ek ∂qlj,k ∂ Ek ∂uir,k ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ Hk = ⎡ ⎣ βC,k βE,k βG,k βE,k βD,k βL,k βG,k βL,k βF,k ⎤ ⎦, ir = i r βC,k = jil ∂2 Ek ∂p2 ji,k , βD,k = j ∂2 Ek ∂q2 lj,k βE,k = jil ∂2 Ek ∂pji,k∂qlj,k , jil = j i l βF,k = ir ∂2 E ∂u2 ir , βG,k = jir ∂2 E ∂pji∂uir βL,k = jir ∂2 E ∂qlj ∂uir , jir = j i r (33) Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 7. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 7 where pji,k, uir , and qlj,k are the weights and α is the learning factor. The two-hidden-layer Levenberg–Marquardt algorithm requires the existence of the inverse in the Hessian [Hk + αI]−1 . From (33), we obtain the inverse of Hk + αI as [Hk +αI]−1 = ⎡ ⎣ α + βC,k βE,k βG,k βE,k α + βD,k βL,k βG,k βL,k α + βF,k ⎤ ⎦ −1 = 1 det[Hk +αI] ⎡ ⎢ ⎣ α + βD,k α + βF,k − βL,k 2 − βE,k α+βF,k + βL,k βG,k βE,k βL,k − α + βD,k βG,k − βE,k α+βF,k + βL,k βG,k α+βC,k α+βF,k − βG,k 2 − α+βC,k βL,k + βG,k βE,k βE,k βL,k − α+βD,k βG,k − α+βC,k βL,k + βG,k βE,k α+βC,k βD,k − βE,k 2 ⎤ ⎥ ⎦ det[Hk]LM = det[Hk + αI] = α + βC,k α + βD,k α + βF,k − βL,k 2 − βE,k βE,k α + βF,k − βL,k βG,k + βG,k βE,k βL,k − α + βD,k βG,k . (34) Remark 4: In the two-hidden-layer Levenberg–Marquardt algorithm of (34) and (33), we can observe that values of zero in (α+βD,k)(α+βF,k)−(βL,k)2 , (βE,k)(α+βF,k)−(βL,k)(βG,k), and (βE,k)(βL,k) − (α + βD,k)(βG,k) of det[Hk]LM are three singularity points in the learning rates βLMji,k, βLMlj,k, and γLM,k. The two-hidden-layer Levenberg–Marquardt algorithm of (34) and (33) is worse than the Levenberg–Marquardt algorithm of (23) and (24) because Levenberg–Marquardt algorithm of (23) and (24) presents one singularity point, while the two-hidden-layer Levenberg–Marquardt algorithm of (34) and (33) presents three singularity points. IV. ERROR STABILITY AND WEIGHTS BOUNDEDNESS ANALYSIS OF THE MODIFIED LEVENBERG– MARQUARDT ALGORITHM In this section, the modified Levenberg–Marquardt algo- rithm is introduced for the artificial neural network learning, and the error stability and weights boundedness are analyzed. A. Modified Levenberg–Marquardt Algorithm The modified Levenberg–Marquardt algorithm is defined as pji,k+1 = pji,k − βMLM,k ∂ Ek ∂pji,k + γMH,k ∂ Ek ∂qlj,k qlj,k+1 = qlj,k − βMLM,k ∂ Ek ∂qlj,k + γMH,k ∂ Ek ∂pji,k βMLM,k = α + βC,k 2 α + βD,k 2 det[Hk]MLM det[Hk]MLM = α + βA,k 2 + βB,k 2 ∗ α+ βC,k 2 α + βD,k 2 + βE,k 2 βA,k = ji ∂ Ek ∂pji,k dl,k − tl,k , βB,k = j ∂ Ek ∂qlj,k dl,k − tl,k βC,k = jil ∂2 Ek ∂p2 ji,k , βD,k = j ∂2 Ek ∂q2 lj,k βE,k = jil ∂2 Ek ∂pji,k∂qlj,k , γMH,k = 0 jil = j i l , ji = j i (35) where ∂ Ek ∂pji,k (dl,k − tl,k ) = qlj,k g/ (z j,k)ai,k ∂ Ek ∂qlj,k (dl,k − tl,k ) = g(z j,k) ∂ Ek ∂pji,k = (dl,k − tl,k )qlj,k g/ (z j,k)ai,k ∂ Ek ∂qlj,k = g(z j,k)(dl,k − tl,k ) ∂2 Ek ∂p2 ji,k = a2 i,kqlj,k ∗ −2g(z j,k)g/ (z j,k)(dl,k − tl,k ) + g/ (z j,k)2 qlj,k ∂2 Ek ∂q2 lj,k = g(z j,k)2 ∂2 Ek ∂pji,k∂qlj,k = ai,k g/ (z j,k) (dl,k − tl,k ) + g(z j,k)qlj,k . (36) βMLM,k is the learning rate, pji,k and qlj,k are the weights, α is the learning factor, g(z j,k) = tanh(z j,k) are the activation functions, and g/ (z j,k) = sec h2 (z j,k) are the derivative of the activation functions. Equations (35) and (36) describe the modified Levenberg–Marquardt algorithm. Remark 5: The modified Levenberg–Marquardt algorithm of (35) and (36) is based on the Levenberg–Marquardt algo- rithm of (23) and (24) and on the Newton algorithm of (18) and (19) but with the following two differences to assure the error stability and weights boundedness. 1) A value of zero in (βC,k )(βD,k) − (βE,k)2 of det[Hk]N is a singularity point in the learning rates βN ji,k, βNlj,k, and γN,k of the Newton algorithm, and a value of zero in (α + (βC,k ))(α + (βD,k)) − (βE,k)2 of det[Hk]LM is a singularity point in the learning rates βLMji,k, βLMlj,k, and γLM,k of the Levenberg–Marquardt algo- rithm, while there is not a value of zero in ([α + (βA,k)2 +(βB,k)2 ]∗[(α+(βC,k )2 )(α+(βD,k )2 )+(βE,k)2 ]) of det[Hk]MLM, and there is not a singularity point in the learning rate βMLM,k of the modified Levenberg– Marquardt algorithm. 2) The Levenberg–Marquardt algorithm has three differ- ent learning rates βLMji,k, βLMlj,k, and γLM,k, and the Newton algorithm has three different learning Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 8. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS rates βN ji,k, βNlj,k, and γN,k, while the modified Levenberg–Marquardt algorithm only has one learning rate βMLM,k . The mentioned differences produce that the error stability and weights boundedness of the modified Levenberg–Marquardt algorithm will be assured in Section IV-B. Remark 6: The application of the modified Levenberg– Marquardt algorithm for the artificial neural network learning is based on the following steps: 1) obtain the artificial neural network output dl,k of Fig. 1 with (1) and (3); 2) obtain the backpropagation of the output layer (∂ Ek/∂qlj,k) with (9), and the backpropagation of the hidden layer (∂ Ek/∂pji,k) with (10); and 3) obtain the updating of the weights of the hidden layer pji,k with (35) and (36) and the weights of the output layer qlj,k with (35) and (36). Please note that step 3) represents the artificial neural network learning. B. Error Stability and Weights Boundedness Analysis We analyze the error stability of the modified Levenberg– Marquardt algorithm by the Lyapunov algorithm detailed by the following theorem. Theorem 1: The errors of the modified Levenberg– Marquardt algorithm (1), (3), (35), and (36) applied for the learning of the data set targets tl,k are uniformly stable, and the upper bound of the average errors o2 l,k satisfies lim sup T →∞ 1 T T k=2 o2 l,k ≤ 2 α μ2 l (37) where o2 l,k = (1/2)βMLM,k−1(dl,k−1 − tl,k−1)2 , 0 α ≤ 1 ∈ , and 0 βMLM,k ∈ are in (35), (dl,k−1 − tl,k−1) are the errors, μl are the upper bounds of the uncertainties μl,k, and |μl,k| μl. Proof: Define the next positive function l,k = 1 2 βMLM,k−1 dl,k−1 −tl,k−1 2 + ji p2 ji,k + j q2 lj,k (38) where pji,k and qlj,k are in (35), (36). Then, l,k is l,k = 1 2 βMLM,k dl,k − tl,k 2 + ji p2 ji,k+1 + j q2 lj,k+1 − 1 2 βMLM,k−1 dl,k−1 − tl,k−1 2 − ji p2 ji,k − j q2 lj,k. (39) Now, the weights errors are as ji p2 ji,k+1 = ji p2 ji,k − 2βMLM,k ∂ Ek ∂pji,k ji pji,k + β2 MLM,k ∂ Ek ∂pji,k 2 ji q2 lj,k+1 = ji q2 lj,k − 2βMLM,k ∂ Ek ∂qlj,k j qlj,k + β2 MLM,k ∂ Ek ∂qlj,k 2 . (40) Substituting (40) into (39) is l,k = −2βMLM,k ∂ Ek ∂pji,k ji pji,k + β2 MLM,k ∂ Ek ∂pji,k 2 − 2βMLM,k ∂ Ek ∂qlj,k j qlj,k + β2 MLM,k ∂ Ek ∂qlj,k 2 + 1 2 βMLM,k dl,k −tl,k 2 − 1 2 βMLM,k−1 dl,k−1 −tl,k−1 2 . (41) Equation (41) is rewritten as l,k = 1 2 βMLM,k dl,k − tl,k 2 − 1 2 βMLM,k−1 dl,k−1 − tl,k−1 2 − 2βMLM,k ⎡ ⎣ ∂ Ek ∂pji,k ji pji,k + ∂ Ek ∂qlj,k j qlj,k ⎤ ⎦ + β2 MLM,k ∂ Ek ∂pji,k 2 + ∂ Ek ∂qlj,k 2 . (42) Using the closed-loop dynamics ((∂ Ek/∂pji,k)/(dl,k − tl,k )) ji pji,k + ((∂ Ek/∂qlj,k)/(dl,k − tl,k )) j qlj,k = (dl,k − tl,k ) − μl,k of [31] and [33] in the second element of (42), it can be seen that ∂ Ek ∂pji,k ji pji,k + ∂ Ek ∂qlj,k j qlj,k = dl,k − tl,k ⎡ ⎣ ∂ Ek ∂pji,k dl,k − tl,k ji pji,k + ∂ Ek ∂qlj,k dl,k −tl,k j qlj,k ⎤ ⎦ = dl,k − tl,k dl,k − tl,k − μl,k (43) where μl,k are the uncertainties. Substituting (43) in the second element of (42) is l,k = 1 2 βMLM,k dl,k − tl,k 2 − 1 2 βMLM,k−1 dl,k−1 − tl,k−1 2 − 2βMLM,k dl,k − tl,k dl,k − tl,k − μl,k + β2 MLM,k ⎡ ⎣ ⎛ ⎝ ji ∂ Ek ∂pji,k ⎞ ⎠ 2 + ⎛ ⎝ j ∂ Ek ∂qlj,k ⎞ ⎠ 2⎤ ⎦ l,k = 1 2 βMLM,k dl,k − tl,k 2 − 1 2 βMLM,k−1 dl,k−1 − tl,k−1 2 − 2βMLM,k dl,k − tl,k 2 + 2βMLM,k dl,k − tl,k μl,k + β2 MLM,k dl,k − tl,k 2 βA,k 2 + βB,k 2 (44) where βA,k = ji ((∂ Ek/∂pji,k)/(dl,k − tl,k )) and βB,k = j ((∂ Ek/∂qlj,k)/(dl,k − tl,k )). Substituting βMLM,k of (35) into the element β2 MLM,k(dl,k − tl,k )2 [(βA,k)2 + (βB,k)2 ] and considering α ≤ 1 is given in (45), as shown at the bottom of the next page. In (45), βA,k = ji ((∂ Ek/∂pji,k)/(dl,k − tl,k )) and βB,k = j ((∂ Ek/∂qlj,k)/(dl,k −tl,k)). Taking in to account Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 9. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 9 that 2βMLM,k (dl,k − tl,k)μl,k ≤ (1/2)βMLM,k(dl,k − tl,k )2 + 2βMLM,kμ2 l,k and employing (45) in (44) gives l,k ≤ 1 2 βMLM,k dl,k − tl,k 2 − 1 2 βMLM,k−1 dl,k−1 − tl,k−1 2 − 2βMLM,k dl,k − tl,k 2 + 1 2 βMLM,k dl,k − tl,k 2 + 2βMLM,kμ2 l,k + βMLM,k dl,k − tl,k 2 l,k ≤ − 1 2 βMLM,k−1 dl,k−1 − tl,k−1 2 + 2βMLM,k μ2 l,k. (46) From (35) βMLM,k = α+ βC,k 2 α+ βD,k 2 α+ βA,k 2 + βB,k 2 α+ βC,k 2 α+ βD,k 2 + βE,k 2 ≤ 1 α . (47) Employing (47) and |μl,k| ≤ μl in (46) gives l,k ≤ − 1 2 βMLM,k−1 dl,k−1 − tl,k−1 2 + 2 α μ2 l . (48) Employing (48), the errors of the modified Levenberg– Marquardt are uniformly stable. Hence, l,k is bounded. Taking into account (48) and o2 l,k of (37), it is l,k ≤ −o2 l,k + 2 α μ2 l . (49) Summarizing (49) from 2 to T is T k=2 o2 l,k − 2 α μ2 l ≤ l,1 − l,T . (50) Employing that l,T 0 is bounded 1 T T k=2 o2 l,k ≤ 2 α μ2 l + 1 T l,1 ⇒ lim sup T→∞ 1 T T k=2 o2 l,k ≤ 2 α μ2 l . (51) Equation (51) is similar to (37). Remark 7: The result of Theorem 1 that the errors of the modified Levenberg–Marquardt algorithm for the artificial neural network learning are assured to be stable produces that the artificial neural network outputs dl,k of the modified Levenberg–Marquardt algorithm remain bounded during all the training and testing. The following theorem proves the weights boundedness of the modified Levenberg–Marquardt. Theorem 2: When the average errors o2 l,k+1 are bigger than the uncertainties (2/α)μ2 l , the weights errors are bounded by the initial weights errors as o2 l,k+1 ≥ 2 α μ2 l ⇒ ji p2 ji,k+1 + j q2 lj,k+1 ≤ ji p2 ji,1+ j q2 lj,1 (52) where p2 ji,k+1 and q2 lj,k+1 are the weights, p2 ji,1 and q2 lj,1 are the initial weights, o2 l,k+1 = (1/2)βMLM,k(dl,k −tl,k )2 and (dl,k−1 − tl,k−1) are the errors, and 0 α ≤ 1 ∈ , 0 βMLM,k ∈ , and μl are the upper bounds of the uncertainties μl,k , |μl,k| μl. Proof: From (40), the weights are written as ji p2 ji,k+1 = ji p2 ji,k − 2βMLM,k ∂ Ek ∂pji,k ji pji,k + β2 MLM,k ∂ Ek ∂pji,k 2 ji q2 lj,k+1 = ji q2 lj,k − 2βMLM,k ∂ Ek ∂qlj,k j qlj,k + β2 MLM,k ∂ Ek ∂qlj,k 2 . (53) Adding ji p2 ji,k+1 with ji q2 lj,k+1 of (53) gives ji p2 ji,k+1 + ji q2 lj,k+1 = ji p2 ji,k + ji q2 lj,k − 2βMLM,k ∂ Ek ∂pji,k ji pji,k + β2 MLM,k ∂ Ek ∂pji,k 2 − 2βMLM,k ∂ Ek ∂qlj,k j qlj,k + β2 MLM,k ∂ Ek ∂qlj,k 2 . (54) Equation (54) is represented as ji p2 ji,k+1 + ji q2 lj,k+1 = ji p2 ji,k + ji q2 lj,k − 2βMLM,k ⎡ ⎣ ∂ Ek ∂pji,k ji pji,k + ∂ Ek ∂qlj,k j qlj,k ⎤ ⎦ + β2 MLM,k ∂ Ek ∂pji,k 2 + ∂ Ek ∂qlj,k 2 . (55) β2 MLM,k dl,k − tl,k 2 βA,k 2 + βB,k 2 = βMLM,k βA,k 2 + βB,k 2 βMLM,k dl,k − tl,k 2 = ⎛ ⎝ βA,k 2 + βB,k 2 α+ βC,k 2 α+ βD,k 2 α+ βA,k 2 + βB,k 2 α+ βC,k 2 α+ βD,k 2 + βE,k 2 ∗ βMLM,k dl,k − tl,k 2 ⎞ ⎠ ≤ βMLM,k dl,k − tl,k 2 . (45) Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 10. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Substituting (∂ Ek/∂pji,k) ji pji,k + (∂ Ek/∂qlj,k) j qlj,k = (dl,k − tl,k )[(dl,k − tl,k ) − μl,k ] of (43) in the second element of (55) gives ji p2 ji,k+1 + ji q2 lj,k+1 = ji p2 ji,k + ji q2 lj,k − 2βMLM,k dl,k − tl,k dl,k − tl,k − μl,k + β2 MLM,k ∂ Ek ∂pji,k 2 + ∂ Ek ∂qlj,k 2 ji p2 ji,k+1 + ji q2 lj,k+1 = ji p2 ji,k + ji q2 lj,k − 2βMLM,k dl,k − tl,k 2 + 2βMLM,k dl,k − tl,k μl,k + β2 MLM,k dl,k − tl,k 2 βA,k 2 + βB,k 2 (56) where μl,k are the uncertainties, βA,k = ji ((∂ Ek/∂pji,k)/ (dl,k − tl,k )), and βB,k = j ((∂ Ek/∂qlj,k)/(dl,k − tl,k)). Substituting 2βMLM,k(dl,k − tl,k )μl,k ≤ (1/2)βMLM,k(dl,k − tl,k )2 + 2βMLM,k μ2 l,k into the third element of (56) and β2 MLM,k(dl,k − tl,k)2 [(βA,k)2 + (βB,k)2 ] ≤ βMLM,k (dl,k − tl,k )2 of (45) into the last element of (56) give ji p2 ji,k+1 + ji q2 lj,k+1 = ji p2 ji,k + ji q2 lj,k − 2βMLM,k dl,k − tl,k 2 + 1 2 βMLM,k dl,k − tl,k 2 + 2βMLM,kμ2 l,k + βMLM,k dl,k − tl,k 2 ji p2 ji,k+1 + ji q2 lj,k+1 = ji p2 ji,k + ji q2 lj,k − 1 2 βMLM,k dl,k − tl,k 2 + 2βMLM,kμ2 l,k . (57) From (47), βMLM,k ≤ (1/α), and using |μl,k | ≤ μl in (57) gives ji p2 ji,k+1 + ji q2 lj,k+1 = ji p2 ji,k + ji q2 lj,k − 1 2 βMLM,k dl,k − tl,k 2 + 2 α μ2 l . (58) Taking into account o2 l,k+1 = (1/2)βMLM,k(dl,k − tl,k )2 is o2 l,k+1 ≥ 2 α4 μ2 l ⇒ ji p2 ji,k+1 + ji q2 lj,k+1 ≤ ji p2 ji,k + ji q2 lj,k. (59) Taking into account that o2 l,k+1 ≥ (2/α)μ2 l for k ∈ [1, k] is true, hence ji p2 ji,k+1 + ji q2 lj,k+1 ≤ ji p2 ji,k + ji q2 lj,k ≤ · · · ≤ ji p2 ji,1 + ji q2 lj,1. (60) Then, (52) is proven. Remark 8: The result of Theorem 2 that the weights of the modified Levenberg–Marquardt algorithm are bounded produces that the hidden layer weights pji,k and output layer weights qlj,k of the modified Levenberg–Marquardt algorithm for the artificial neural network learning remain bounded during all the training and testing. V. RESULTS In this section, we compare the Newton algorithm (N) of (1), (3), (18), (19), and [1] and [2], the Levenberg–Marquardt algorithm (LM) of (1), (3), (23), (24), and [8]–[11], and the modified Levenberg–Marquardt algorithm (MLM) of (1), (3), (35), and (36) for the artificial neural network learning of electric signal data set because they are based on the Hessian, and we compare the stable gradient algorithm in a neural network (SGNN) of [31] and [32], the stable gradient algo- rithm in a radial basis function neural network (SGRBFNN) of [33], [34], and the modified Levenberg–Marquardt algo- rithm (MLM) of (1), (3), (35), and (36) for the artificial neural network learning of brain signal data set because they are based on the stability. The objective of N, LM, SGNN, SGRBFNN, and MLM is that the artificial neural network outputs dl,k must follow the data set targets tl,k as near as possible. In this part of this article, the abovementioned algorithms are applied for the artificial neural network learning con- taining the training and testing stages. The root-mean-square error (RMSE) is utilized to show the performance accuracy for the comparisons, and it is represented as E = 1 T T k=1 LT l dl,k − tl,k 2 1 2 (61) where dl,k − tl,k are the errors, dl,k are the artificial neural network outputs, tl,k is the data set targets, LT is the total outputs number, and T is the final iteration. A. Electric Signals The electric signal data set information is obtained from Electricity Load and Price Forecasting with MATLAB where the details are explained in [35]. The electric signal data set is the history of electric energy usage at each hour and temperature observations of the International Organization for Standardization (ISO) of Great Britain. The meteorological information includes the temperature of the dry bulb and the dew point, taking into account the electric signal data set of the hourly electric energy usage called an electric signal. In the electric signal data set, we consider eight inputs described as follows: a1,k is the temperature of the dry bulb, Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 11. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 11 Fig. 3. Training for the first electric signal data set. a2,k is the dew point, a3,k is the hour of the day, a4,k is the day of the week, a5,k is a mark indicating if this is a free or a weekend day, a6,k is the medium load of the past day, a7,k is the load of the same hour, in the past day, and a8,k is the load of the same hour and day of the past week, and we consider 1 target described as follows: t1,k is the load of the same day. In the artificial neural network learning, we consider eight artificial neural network inputs denoted as a1,k, a2,k, a3,k, a4,k, a5,k, a6,k, a7,k, and a8,k that are the same inputs of the electric signal data set, and we consider one artificial neural network output denoted as d1,k. We utilize 7000 iterations of the data set for the artificial neural network training, and we utilize 1000 iterations of the data set for the artificial neural network testing. The objective of N, LM, and MLM is that the artificial neural network output d1,k must follow the target t1,k as near as possible. The N of [1] and [2] is detailed as (1), (3), (18), and (19) with eight inputs, one output, and five neurons in the hidden layer, α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a random number between 0 and 1. The LM of [8]–[11] is detailed as (1), (3), (23), and (24) with eight inputs, one output, and five neurons in the hidden layer, α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a random number between 0 and 1. The MLM is detailed as (1), (3), (35), and (36), with eight inputs, one output, and five neurons in the hidden layer, α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a random number between 0 and 1. The comparisons for the training and testing of the N, LM, and MLM for the first electric signal data set are shown in Figs. 3 and 4. The weights of the MLM for the first electric signal data set are shown in Figs. 5 and 6. The comparisons for the training and testing of the N, LM, and MLM for the second electric signal data set are shown in Figs. 7 and 8. The weights of the MLM for the second electric signal data set are shown in Figs. 9 and 10. The training and testing RMSE comparisons of the performance accuracy (61) for the first electric signal data set are shown in Table I and, for the second electric signal data set, are shown in Table II. Please note that the most important data are related to the output d1,k. To improve the training and testing, more neurons in the hid- den layer could be included; nevertheless, this decision could increase the computational cost. From Figs. 3, 4, 7, and 8, Fig. 4. Testing for the first electric signal data set. Fig. 5. Hidden layer weights for the first electric signal data set. Fig. 6. Output layer weights for the first electric signal data set. TABLE I RMSE FOR THE FIRST ELECTRIC SIGNAL DATA SET it is observed that the MLM improves the LM and N because the signal of the MLM follows better the electric signal data set than the other. From Figs. 5, 6, 9, and 10, it is observed that the Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 12. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Fig. 7. Training for the second electric signal data set. Fig. 8. Testing for the second electric signal data set. Fig. 9. Hidden layer weights for the second electric signal data set. TABLE II RMSE FOR THE SECOND ELECTRIC SIGNAL DATA SET weights of the MLM remain bounded. From Tables I and II, it is observed that the MLM achieves better performance accuracy for training and testing compared with LM and N Fig. 10. Output layer weights for the second electric signal data set. because the RMSE is the smallest for the MLM. Thus, MLM is the best option for learning in the electric signal data set. B. Brain Signals The brain signal data set information is obtained from our laboratory where the details are explained in [36]. The brain signal data set is the real data of brain signals. The alpha signal is obtained in this study because it has more probabilities to be found. The acquisition system is applied with a 28-year old healthy man when his eyes are closed. There are four different signals received by the brain signals. In the brain signal data set, we consider three inputs described as follows: a1,k is the brain signal of the focal point 1, a2,k is the brain signal of the focal point 2, and a3,k is the brain signal of the focal point 3, and we consider 1 target described as follows: t1,k is the brain signal of the focal point 4. In the artificial neural network learning, we consider three artificial neural network inputs denoted as a1,k, a2,k, and a3,k that are the same inputs of the brain signal data set, and we consider one artificial neural network output denoted as d1,k. We utilize 7000 iterations of the data set for the artificial neural network training, and we utilize 1000 iterations of the data set for the artificial neural network testing. The objective of SGNN, SGRBFNN, and MLM is that the artificial neural network output d1,k must follow the target t1,k as near as possible. The SGNN of [31] and [32] is detailed with three inputs, one output, and five neurons in the hidden layer, α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a random number between 0 and 1. The SGRBFNN of [33] and [34] is detailed with three inputs, one output, and five neurons in the hidden layer, α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a random number between 0 and 1. The MLM is detailed as (1), (3), (35), and (36) with three inputs, one output, and five neurons in the hidden layer, α = 0.9, pji,1 = rand, qlj,1 = rand, and rand is a random number between 0 and 1. The comparisons for the training and testing of the SGNN, SGRBFNN, and MLM for the first brain signal data set are Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 13. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 13 Fig. 11. Training for the first brain signal data set. Fig. 12. Testing for the first brain signal data set. Fig. 13. Hidden layer weights for the first brain signal data set. shown in Figs. 11 and 12. The weights of the MLM for the first brain signal data set are shown in Figs. 13 and 14. The comparisons for the training and testing of the SGNN, SGRBFNN, and MLM for the second brain signal data set in Figs. 15 and 16. The weights of the MLM for the second brain signal data set in Figs. 17 and 18. The training and testing RMSE comparisons of the performance accuracy (61) for the first brain signal data set are shown in Table III and, for the second brain signal data set, are shown in Table IV. Please note that the most important data are related to the output d1,k. To improve the training and testing, more neurons in the hid- den layer could be included; nevertheless, this decision could increase the computational cost. From Figs. 11, 12, 15, and 16, Fig. 14. Output layer weights for the first brain signal data set. Fig. 15. Training for the second brain signal data set. TABLE III RMSE FOR THE FIRST BRAIN SIGNAL DATA SET Fig. 16. Testing for the second brain signal data set. it is observed that the MLM improves the SGRBFNN and SGNN because the signal of the MLM follows better the brain signal data set than the other. From Figs. 13, 14, 17, and 18, Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 14. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 14 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Fig. 17. Hidden layer weights for the second brain signal data set. Fig. 18. Output layer weights for the second brain signal data set. TABLE IV RMSE FOR THE SECOND BRAIN SIGNAL DATA SET it is observed that the weights of the MLM remain bounded. From Table IV, it is observed that the MLM achieves better performance accuracy for training and testing compared with SGRBFNN and SGNN because the RMSE is the smallest for the MLM. Thus, the MLM is the best option for learning in the brain signal data set. Remark 9: The result of Theorem 1 that the error of the MLM is assured to be stable, while the error some of the N, LM, SGNN, and SGRBFNN are not assured to be stable can be observed mainly in the training of Figs. 3, 7, 11, and 15 and in the testing of Figs. 4, 8, 12, and 16, where the signals of the N, LM, and SGNN are unbounded during the training or testing, while the signal of the MLM remains bounded during all the training and testing. Remark 10: The result of Theorem 2 that the weights of the MLM are bounded can be observed mainly in the hidden layer weights of Figs. 5, 9, 13, and 17 and in the output layer weights of Figs. 6, 10, 14, and 18, where the weights of the MLM remain bounded during all the training. The weights of the MLM also remain bounded during all the testing because they take the last value obtained during the training. VI. CONCLUSION The objective of this article is to introduce an algorithm called modified Levenberg–Marquardt for the artificial neural network learning. The modified Levenberg–Marquardt was compared with the Newton, Levenberg–Marquardt, and stable gradient algorithms for learning of the electric and brain signal data set, resulting in that we obtained the best performance accuracy with the modified Levenberg–Marquardt because we obtained the nearest following of the artificial neural network output to the data set target and because we obtained the smallest value in the RMSE. In the forthcoming work, we will propose other algorithms for the artificial neural network learning to compare with our results, or we will apply our algorithm for the learning of other robotic or mechatronic systems. ACKNOWLEDGMENT The author is grateful for the Editor-in-Chief, Associate Edi- tor, and Reviewers for their valuable comments and insightful suggestions that helped to improve this research significantly. He would also like to thank the Instituto Politécnico Nacional, the Secretaría de Investigación y Posgrado, the Comisión de Operación y Fomento de Actividades Académicas, and the Consejo Nacional de Ciencia y Tecnología for their help in this research. REFERENCES [1] S. Kostić and D. Vasović, “Prediction model for compressive strength of basic concrete mixture using artificial neural networks,” Neural Comput. Appl., vol. 26, no. 5, pp. 1005–1024, Jul. 2015. [2] B. Sahoo and P. K. Bhaskaran, “Prediction of storm surge and inundation using climatological datasets for the indian coast using soft computing techniques,” Soft Comput., vol. 23, no. 23, pp. 12363–12383, Dec. 2019. [3] T.-L. Le, “Intelligent fuzzy controller design for antilock braking sys- tems,” J. Intell. Fuzzy Syst., vol. 36, no. 4, pp. 3303–3315, Apr. 2019. [4] C. Yin, S. Wu, S. Zhou, J. Cao, X. Huang, and Y. Cheng, “Design and stability analysis of multivariate extremum seeking with Newton method,” J. Franklin Inst., vol. 355, no. 4, pp. 1559–1578, Mar. 2018. [5] S. Chakia, B. Shanmugarajanb, S. Ghosalc, and G. Padmanabham, “Application of integrated soft computing techniques for optimisation of hybrid CO2 laser–MIG welding process,” Appl. Soft Comput., vol. 30, pp. 365–374, May 2015. [6] Y. Li, H. Zhang, J. Han, and Q. Sun, “Distributed multi-agent opti- mization via event-triggered based continuous-time Newton–Raphson algorithm,” Neurocomputing, vol. 275, pp. 1416–1425, Jan. 2018. [7] M. S. Salim and A. I. Ahmed, “A quasi-Newton augmented lagrangian algorithm for constrained optimization problems,” J. Intell. Fuzzy Syst., vol. 35, no. 2, pp. 2373–2382, Aug. 2018. [8] C. Lv et al., “Levenberg–arquardt backpropagation training of multilayer neural networks for state estimation of a safety-critical cyber-physical system,” IEEE Trans. Ind. Informat., vol. 14, no. 8, pp. 3436–3446, Aug. 2018. [9] M. J. Rana, M. S. Shahriar, and M. Shafiullah, “Levenberg–Marquardt neural network to estimate UPFC-coordinated PSS parameters to enhance power system stability,” Neural Comput. Appl., vol. 31, pp. 1237–1248, Jul. 2019. [10] A. Sarabakha, N. Imanberdiyev, E. Kayacan, M. A. Khanesar, and H. Hagras, “Novel Levenberg–Marquardt based learning algorithm for unmanned aerial vehicles,” Inf. Sci., vol. 417, pp. 361–380, Nov. 2017. [11] J. S. Smith, B. Wu, and B. M. Wilamowski, “Neural network training with Levenberg–Marquardt and adaptable weight compression,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 2, pp. 580–587, Feb. 2019. Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.
  • 15. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. RUBIO: STABILITY ANALYSIS OF THE MODIFIED LEVENBERG–MARQUARDT ALGORITHM 15 [12] H. G. Han, Y. Li, Y. N. Guo, and J. F. Qiao, “A soft computing method to predict sludge volume index based on a recurrent self-organizing neural network,” Appl. Soft Comput., vol. 38, pp. 477–486, Jan. 2016. [13] J. Qiao, L. Wang, C. Yang, and K. Gu, “Adaptive Levenberg-Marquardt algorithm based echo state network for chaotic time series prediction,” IEEE Access, vol. 6, pp. 10720–10732, 2018. [14] A. Parsaie, A. H. Haghiabi, M. Saneie, and H. Torabi, “Applica- tions of soft computing techniques for prediction of energy dissipa- tion on stepped spillways,” Neural Comput. Appl., vol. 29, no. 12, pp. 1393–1409, Jun. 2018. [15] N. Zhang and D. Shetty, “An effective LS-SVM-based approach for surface roughness prediction in machined surfaces,” Neurocomputing, vol. 198, pp. 35–39, Jul. 2016. [16] E. Esme and B. Karlik, “Fuzzy c-means based support vector machines classifier for perfume recognition,” Appl. Soft Comput., vol. 46, pp. 452–458, Sep. 2016. [17] P. Fergus, I. Idowu, A. Hussain, and C. Dobbins, “Advanced artificial neural network classification for detecting preterm births using EHG records,” Neurocomputing, vol. 188, pp. 42–49, May 2016. [18] A. Narang, B. Batra, A. Ahuja, J. Yadav, and N. Pachauri, “Classifica- tion of EEG signals for epileptic seizures using Levenberg-Marquardt algorithm based multilayer perceptron neural network,” J. Intell. Fuzzy Syst., vol. 34, no. 3, pp. 1669–1677, Mar. 2018. [19] J. Dong, K. Lu, J. Xue, S. Dai, R. Zhai, and W. Pan, “Accelerated non- rigid image registration using improved Levenberg–Marquardt method,” Inf. Sci., vol. 423, pp. 66–79, Jan. 2018. [20] J. Li, W. X. Zheng, J. Gu, and L. Hua, “Parameter estimation algorithms for Hammerstein output error systems using Levenberg–Marquardt opti- mization method with varying interval measurements,” J. Franklin Inst., vol. 354, pp. 316–331, Jan. 2017. [21] X. Yang, B. Huang, and H. Gao, “A direct maximum likelihood optimization approach to identification of LPV time-delay systems,” J. Franklin Inst., vol. 353, no. 8, pp. 1862–1881, May 2016. [22] I. S. Baruch, V. A. Quintana, and E. P. Reynaud, “Complex-valued neural network topology and learning applied for identification and control of nonlinear systems,” Neurocomputing, vol. 233, pp. 104–115, Apr. 2017. [23] M. Kaminski and T. Orlowska-Kowalska, “An on-line trained neural controller with a fuzzy learning rate of the Levenberg–Marquardt algorithm for speed control of an electrical drive with an elastic joint,” Appl. Soft Comput., vol. 32, pp. 509–517, Jul. 2015. [24] S. Roshan, Y. Miche, A. Akusok, and A. Lendasse, “Adaptive and online network intrusion detection system using clustering and extreme learning machines,” J. Franklin Inst., vol. 355, no. 4, pp. 1752–1779, Mar. 2018. [25] C. Bishop, “Exact calculation of the hessian matrix for the multilayer perceptron,” Neural Comput., vol. 4, no. 4, pp. 494–501, Jul. 1992. [26] C. M. Bishop, “A fast procedure for retraining the multilayer percep- tron,” Int. J. Neural Syst., vol. 2, no. 3, pp. 229–236, 1991. [27] C. M. Bishop, “Curvature-driven smoothing in feedforward networks,” in Proc. Seattle Int. Joint Conf. Neural Netw. (IJCNN), 1990, p. 749. [28] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Math. Control, Signals, Syst., vol. 2, no. 4, pp. 303–314, Dec. 1989. [29] R. B. Ash, Real Analysis and Probability. New York, NY, USA: Academic, 1972. [30] J. S. R. Jang and C. T. Sun, Neuro-Fuzzy and Soft Computing. Upper Saddle River, NJ, USA: Prentice-Hall, 1996. [31] J. de Jesús Rubio, P. Angelov, and J. Pacheco, “Uniformly stable backpropagation algorithm to train a feedforward neural network,” IEEE Trans. Neural Netw., vol. 22, no. 3, pp. 356–366, Mar. 2011. [32] W. Yu and X. Li, “Discrete-time neuro identification without robust mod- ification,” IEE Proc.-Control Theory Appl., vol. 150, no. 3, pp. 311–316, May 2003. [33] J. D. J. Rubio, I. Elias, D. R. Cruz, and J. Pacheco, “Uniform stable radial basis function neural network for the prediction in two mecha- tronic processes,” Neurocomputing, vol. 227, pp. 122–130, Mar. 2017. [34] J. D. J. Rubio, “USNFIS: Uniform stable neuro fuzzy inference system,” Neurocomputing, vol. 262, pp. 57–66, Nov. 2017. [35] I. Elias et al., “Genetic algorithm with radial basis mapping network for the electricity consumption modeling,” Appl. Sci., vol. 10, no. 12, p. 4239, Jun. 2020. [36] J. D. J. Rubio, D. M. Vázquez, and D. Mújica-Vargas, “Acquisition system and approximation of brain signals,” IET Sci., Meas. Technol., vol. 7, no. 4, pp. 232–239, Jul. 2013. José de Jesús Rubio (Member, IEEE) is currently a full-time Professor with the Sección de Estudios de Posgrado e Investigación, ESIME Azcapotzalco, Instituto Politécnico Nacional, Ciudad de México, Mexico. He has published over 142 international journal articles with 2214 cites from Scopus. He has been the tutor of four Ph.D. students, 20 Ph.D. students, 42 M.S. students, 4 S. students, and 17 B.S. students. Dr. Rubio was a Guest Editor of Neurocomputing, Applied Soft Computing, Sensors, The Journal of Supercomputing, Computational Intelligence and Neuroscience, Frontiers in Psychology, and the Journal of Real-Time Image Processing. He also serves as an Associate Editor for the IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, the IEEE TRANSACTIONS ON FUZZY SYSTEMS, Neural Computing and Applications, Frontiers in Neurorobotics, and Mathe- matical Problems in Engineering. Authorized licensed use limited to: Cornell University Library. Downloaded on August 20,2020 at 17:32:14 UTC from IEEE Xplore. Restrictions apply.