An informative probability model enhancing real time
echobiometry to improve fetal weight estimation accuracy
G. Cevenini Æ F. M. Severi Æ C. Bocchi Æ
F. Petraglia Æ P. Barbini
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Informativni model verjetnosti | An informative probability model
1. Med Bio Eng Comput (2008) 46:109–120
DOI 10.1007/s11517-007-0299-2
ORIGINAL ARTICLE
An informative probability model enhancing real time
echobiometry to improve fetal weight estimation accuracy
G. Cevenini Æ F. M. Severi Æ C. Bocchi Æ
F. Petraglia Æ P. Barbini
Received: 4 May 2007 / Accepted: 28 November 2007 / Published online: 10 January 2008
Ó International Federation for Medical and Biological Engineering 2007
Abstract A multinormal probability model is proposed to
correct human errors in fetal echobiometry and improve the
estimation of fetal weight (EFW). Model parameters were
designed to depend on major pregnancy data and were
estimated through feed-forward artificial neural networks
(ANNs). Data from 4075 women in labour were used for
training and testing ANNs. The model was implemented
numerically to provide EFW together with probabilities of
congruence among measured echobiometric parameters. It
enabled ultrasound measurement errors to be real-time
checked and corrected interactively. The software was useful for training medical staff and standardizing measurement
procedures. It provided multiple statistical data on fetal
morphometry and aid for clinical decisions. A clinical protocol for testing the system ability to detect measurement
errors was conducted with 61 women in the last week of
pregnancy. It led to decisive improvements in EFW accuracy.
Keywords Probability model Á Neural networks Á
Ultrasound Á Echobiometry Á Fetal weight estimation
1 Introduction
Many decisions in obstetrics depend on gestational age
(GA) and fetal weight (FW). Accurate ultrasound
G. Cevenini (&) Á P. Barbini
Department of Surgery and Bioengineering, University of Siena,
Viale Mario Bracci 16, 53100 Siena, Italy
e-mail: cevenini@unisi.it
F. M. Severi Á C. Bocchi Á F. Petraglia
Department of Pediatrics, Obstetrics and Reproductive
Medicine, University of Siena,
Viale Mario Bracci 16, 53100 Siena, Italy
examination performed before 20 weeks of gestation
enables true GA to be estimated [42]. On the other hand,
estimation of FW (EFW) using standard biometric
parameters, usually related to geometric dimensions of the
fetal head, abdomen and long bones of extremities, is still
problematical [18].
Monitoring of fetal growth is fundamental in modern
perinatology because it is strictly related to fetal/neonatal
wellbeing [43]. Moreover, identification of abnormal
intrauterine growth patterns enables better pregnancy
management [10, 21, 43].
In the last 30 years, many methods have been developed
to improve EFW accuracy, most based on formulae derived
by regression analysis [3, 16, 22, 23, 25, 27, 33, 35, 36, 38,
41, 44], or on physical models [2, 14, 17, 29]. Artificial
neural networks (ANNs) and volumetric methods based on
three-dimensional (3D) ultrasonography were also recently
proposed [11, 20, 40].
Clinical use of these mathematical models led to introduction of EFW in ultrasound reports. Although effective
in the original papers, ultrasound operators know that every
estimation model loses efficacy when applied in clinical
practice [9, 17]. The differences between accuracies in the
literature and those obtained in local clinical institutions
are due to many factors, the ones being significant statistical dissimilarity between original and local populations
and samples, diversities in echobiometric measurement
procedures and lack of model generalization. Little attention has usually been paid to generalization, which refers to
a model ability to provide the same accuracy on data not
used for model identification [5]. Specifically, empirical
formulae do not guarantee a good compromise between
model flexibility to fit all useful information and robustness
to filter useless data variability. Too many model
parameters have been estimated from few ultrasound cases
123
2. 110
near delivery. Sometimes fetuses with non-homogeneous
weight or GA intervals not representative of the whole
population are used. In other cases the clinical condition of
women in labour is neglected or incorrectly reported.
Although attempts to reduce statistical sample errors and
lack of generalization power by selecting the most accurate
and representative models have been made, a percentage
mean absolute error less than 7–8% of the true BW has
never been achieved in current clinical practice, with 25%
(or more) of estimates having an absolute error over 10%
[29]. Unfortunately, since most obstetricians take 10% as a
critical error threshold above which EFW cannot guarantee
correct clinical management, the method cannot yet be
considered reliable for clinical decision-making [7, 17].
Though many attempts have been made to reduce estimation errors by means of models specialized in particular
ranges of FW or GA [16, 23, 36], or derived from sophisticated 3D and ANN methods [11, 40], it has not proven
possible to significantly reduce the error, because it is presumably due to many different unpredictable factors (human,
environmental, instrumental, technological, etc.) associated
with digital processing of echobiometric values [17].
Since the 10% error limit for all populations of fetuses is
not so far away, there is great interest in finding solutions
that could improve EFW accuracy enough to reach the goal.
Actually, the only way to enhance fetal weight prediction accuracy seems to be reduction of operator
measurement error. Indeed, readings made by operators
with long experience in fetal ultrasound have significantly,
but not still sufficiently, lower errors.
This paper describes a computerized information system
to help ultrasound operators in the control and interactive
correction of measurement errors in two-dimensional fetal
biometry. It is based on a Gaussian multivariate (multinormal) probability model, the parameters of which are
identified by ANNs trained with sample data representing a
wide fetal population. Therefore, it properly belongs to
machine learning methods which are widely used in computing applications to support clinical decision making.
The effective level of real time improvement in the accuracy of EFW was tested clinically in a small sample of
pregnant women.
2 Methods
2.1 Population and samples
To design the model we used data of 4,075 fetuses in the
last week before birth, recorded in our clinics over the last
10 years. Only fetuses with evident malformations were
excluded from the database which was divided into three
samples equally representative of the fetal population:
123
Med Bio Eng Comput (2008) 46:109–120
a training set and a validation set of the same size from the
first 3,200 fetuses, the former by odd positions and the
latter by even positions of the chronologically ordered list;
the last 875 cases constituted a testing set. The training and
validation sets were used for model training, whereas the
testing set was used to check that model performance
remained statistically equivalent with new data (generalization ability). Finally, the system was applied in clinical
practice to 61 pregnant women in the last week before
delivery to verify its effective capacity to support interactive correction of real-time ultrasound measurements and
to improve EFW accuracy.
2.2 Measurement variables
Fetal echobiometric data, including biparietal diameter
(BPD), head and abdominal circumferences (HC, AC), and
femur length (FL), were measured by transabdominal
ultrasound scan with a Siemens Sonoline Elegra Millenium
Edition ultrasound system or a MYLAB Family instrument
(ESAOTE spa, Genova, Italy). Gestational age (GA) in
weeks was established by accurate menstrual history confirmed by ultrasound examination before the 20th week of
gestation. True FW was determined by measuring birth
weight (BW) with a precision balance soon after the
delivery. BW was the dependent variable used to train our
model to estimate FW from ultrasound scans just before
delivery.
Essential pregnancy data, namely amniotic fluid volume
(AF), number of fetuses (FN) and number of days between
last ultrasound examination and delivery (US-D) were also
entered in the training process.
AF was conceived as a binary-coded qualitative variable
with four categories: normal, absent, reduced and augmented volume. US-D ranged from 0 (i.e. ultrasound
examination and delivery on the same day) to 6 (i.e.
ultrasound examination 6 days before delivery).
2.3 Multinormal probability model
To describe the probability space of the ultrasound measurements we used the multivariate Gaussian density
function:
pðx=wÞ ¼
1
d=2
ð2pÞ jRðwÞj1=2
&
'
1
exp À ½x À lðwÞŠT RÀ1 ½x À lðwÞŠ
w
2
ð1Þ
where T is the vector transposition operator, d = 5 the
parameter space dimension, x = [BPD HC AC FL GA] the
3. Med Bio Eng Comput (2008) 46:109–120
111
vector of current echobiometric parameters, w = [BW AF
FN US-D] an information vector conditioning density
function (1), and l ðwÞ and R ðwÞ the mean vector and
covariance matrix, respectively, of parameters which
depend on w and have to be estimated to completely define
the probability model (1).
2.4 Artificial neural networks
Three feed-forward ANNs were designed to estimate the
parameters l ðwÞ and R ðwÞ of the multivariate normal
model. They were made sufficiently flexible (sufficient
number of hidden neurons and appropriate functions of
neuron activation) to encompass all deterministic data
patterns. Proceeding by trial and error, we selected ANN
architecture having ten neurons in a single hidden layer. It
offered a good compromise between simplicity and
generalization ability through error minimisation. Hidden
neurons were equipped with biased tansig activation
functions. The output neurons had linear activation for
estimating model parameters. The input data were standardized before presentation to the network, so as to have
zero mean and unit standard deviation. Standardization has
been shown to increase the efficiency of ANN training [6].
The first ANN, ANN1, was designed to estimate the
model mean vector, l ðwÞ; for each combination of
pregnancy information w, considered as input data. A
block diagram of ANN1 is shown in Fig. 1, where the
Fig. 1 Block diagram of the
feed-forward ANN training
process
training (T) and prediction (P) phases are in the upper and
lower left sides, respectively. Specifically, ANN1 is
trained to recognize the set of echobiometric measurements x, i.e. BPD, HC, AC, FL and GA, from input data
w, i.e. BW, AF, FN and US-D. Once trained, ANN1
predicts the corresponding most likely (expected) parameter values "; i.e. BPD; HC; AC; FL and GA; for any a
x
given set of pregnancy information. These expected values
are assumed as a reliable estimation of the mean parameter vector l ðwÞ: The ANN1 prediction phase is
reported in Fig.1 because it is necessary to obtain
parameter deviations, [xi - li(w)], (i = 1, 2,…, 5), namely
the differences between an echobiometric measurement,
xi, and its corresponding mean value, li, estimated by
ANN1 as a function of input data w. In the centre of Fig. 1
the calculation of deviations is illustrated, together with
their squared values, i.e. deviances di = [xi - li(w)]2, and
all their paired products, i.e. codeviances didj = [xi li(w)]Á[xj - lj(w)] (i = j = 1, 2,…, 5).
The two remaining ANNs, ANN2 (upper right side of
Fig. 1) and ANN3 (lower right side of Fig. 1), were then
trained to recognize deviances and codeviances, respectively. Once trained, ANN2 and ANN3 could therefore
estimate the expected values of deviances and codeviances, E{[xi - li(w)]2} and E{[xi - li(w)]Á[xj - lj(w)]},
respectively, which were taken as suitable estimations of
variances r2 and covariances rirj of model covariance
i
matrix R ðwÞ: Of all the pregnancy information, only BW
was assumed to affect the model covariance matrix. It is
BPD
2
δ BPD
HC
BW
(T)
2
δ HC
AF
ANN1
δ i2
AC
FN
2
δ AC
FL
ANN2
BW
2
δ FL
GA
(T)
2
δ GA
US-D
- - - - US-D
(P)
GA
FL
FN
ANN1
AC
AF
HC
BW
BPD
δi δj
δ BPDδ HC
δ BPD δAC
δ BPD δFL
δ BPD δGA
δ HC δAC
δ HC δ FL
δ HC δGA
δ AC δFL
δ AC δGA
δ FL δGA
(T)
ANN3
BW
123
4. 112
well-known that the inferential process exploits a reduction
of data dimensions, especially when a large number of
parameters (matrix elements) have to be estimated [6].
Significantly improved accuracy of estimates largely
compensates for the lack of other pregnancy information.
ANN2 and ANN3 were therefore equipped with a single
BW input (see right of Fig. 1). Their prediction phase is not
reported in Fig. 1, to avoid unnecessary detail.
All the ANNs were trained using a batch training
method which updates synaptic weights and neuron biases
only after all inputs and targets have been presented, i.e.
after each iteration. An iterative training algorithm with
gradient descendent momentum and adaptive learning rate
was used to minimise the mean squared error between real
and predicted outputs.
To limit the influence of training algorithm initialization
on the solution, we performed 99 training sessions starting
from 99 different randomly-selected initial values of ANN
parameters (i.e. synaptic biases and weights), and chose the
session giving the median error value (50th sorted value).
The early-stopping method was applied directly during
the training process to control ANN generalization power
and avoid the problem of overfitting [6, 24]. At each iteration, training and validation errors were calculated from
data used to train the ANN (training set) and to validate
generalization (validation set), respectively. Training was
stopped when the validation error did not decrease for ten
consecutive iterations. Testing data was then used to confirm generalization on a third set of cases that had not been
used during training.
2.5 Fetal weight estimation
The principal aim of this study was to predict FW, which
was strictly related to BW for training ANNs. BW is the
first component of pregnancy information vector w and
cannot be known for an unborn fetus.
In the case of a fetus, whose mathematical expressions
will be denoted with an upper symbol *, knowledge of the
~
other three components of vector w; that is AF, FN and USD, and its measured echobiometric parameters, ~; allows
x
ANN1 to identify the vector of expected parameters,
~
lðBWÞ; as a function of unknown BW. It identifies five
monotonic curves on which five expected values of BW
can be found corresponding to actual measurements ~; they
x
are expressed by the five-dimensional vector BWexp.
The most probable value of BW, BWmp, corresponding
to ~; can be derived from model (1) by calculating the
x
volume of the confidence region in parameter space, as
follows. Once the available pregnancy data of information
~
vector w are known, volume depends only on its first
unknown component, BW, and describes the cumulative
123
Med Bio Eng Comput (2008) 46:109–120
~
conditional probability of x representing the strength of
association between true fetal weight and its just-measured
ultrasound parameters. The higher the volume, the more
measurements are expected to be mutually congruent and
accurately related to the associated weight.
The confidence region can be described mathematically
by considering the scalar quantity in the exponential term
of model equation (1):
Q ¼ dT RÀ1 d
ð2Þ
where d = x - l represents the vector of generic parameter deviations.
Q is a quadratic form which was demonstrated to be
2
distributed as dðn À1Þ times a Fisher density function, F,
nðnÀdÞ
with d and (n - d) degrees of freedom [28]. In our application, the number of fetuses n, used for model designing,
was much greater than the parameter space dimension d, so
that the valid approximations (n2 - 1) % n2 and (n 2
d) % n, and therefore dðn À1Þ ffi d; were used for simplifynðnÀdÞ
ing. Thus, the confidence region at probability level a can
be defined as the locus of parameter deviations, d; which
satisfy the following inequality:
Q
À1
d Fc ðd; n; aÞ
ð3Þ
F-1
c
where
is the inverse of cumulative F distribution, Fc,
with d and n degrees of freedom and evaluated at the
probability level a.
Equation (3) describes a five-dimensional hyperellipsoidal region.
The probability, ~; defines the volume of the hyperela
lipsoid on whose surface the current measurements, ~; lie.
x
It can be derived by inverting Eq. (3):
~
~ ¼ Fc ðd; n; Q=dÞ
a
ð4Þ
~
~
where Fc has evaluated at the value Q=d and Q is calcu~ ¼ ~ À l:
lated from formula (2) using d x
The quadratic form of (3) implies a unique maximum,
~max ; for ~: It corresponds to a value of BW necessarily
a
a
located in the interval between the minimum and the
maximum value of vector BWexp. Though ~max could thea
oretically be evaluated analytically, for practical reasons
we did a numerical search among all ~ values correa
sponding to the same number, N, of BW sampling values,
spaced at steps, DBW, of 10 g, that is
N
~max ¼ maxi f~ðBWi Þg
a
a
1
È
É
BW1 ¼ min BWexp
È
É
BWN ¼ max BWexp
BWiþ1 ¼ BWi þ DBW; DBW ¼ 10 g
ð5Þ
BWmp was chosen to correspond with the region of
maximum probability volume, ~max ; and was assumed as
a
5. Med Bio Eng Comput (2008) 46:109–120
113
the current EFW, even long before birth. It represents the
most plausible value of FW associated with the available
pregnancy information and the current echobiometric
measurements, taken together.
~
The vector, l ¼ lðBWmp Þ; of expected parameter values
evaluated at BWmp, provides model deviations, ~m ¼ ~ À
d
x
~
l; from actual measurements, and their probabilities, ~m ;
a
which account for measurement errors and morphological
characteristics of fetal physiopathology.
~m can be derived by projecting the multivariate normal
a
model (1) along any generic parameter axes, xk (k = 1,
2,…, 5), as follows:
(
)
~
1
1 ð x k À lk Þ 2
pðxk =wÞ ¼ pffiffiffiffiffiffiffiffiffiffi exp À
ð6Þ
2
~k
r2
2p~2
rk
~
~
~k
where lk is of course the kth component of l and r2 is the
corresponding variance from the principal diagonal of
~
covariance matrix R ¼ RðBWmp Þ:
Any component ~m;k of vector ~m can therefore be cala
a
culated from (6):
8
Z xk
~
>
>1 À 2
>
~ ~
pðxk =wÞ
if xk lk
<
À1
~m;k ¼
ð7Þ
a
Z þ1
>
>1 À 2
>
~
~
pðxk =wÞ
if xk [ lk
:
~
xk
Accuracy of EFW was evaluated by computing the mean
absolute percentage error, MAE%:
MAE% ¼
N
X AEi
 100
i
N
1
experience) was chosen to perform fetal biometry. Ultrasound data were entered in the model to evaluate the
probability of agreement among measured fetal biometric
parameters and actual EFW.
On the basis of clinical evidence, model-estimated
maximum probability, ~max ; corresponding to the most
a
probable EFW (i.e. BWmp) and congruence probabilities of
the parameters, ~m ; the operators decided autonomously
a
whether or not to correct the first set of measurements and
to proceed with further refined measurements. Specifically,
for each set, ~; of measured echobiometric parameters, the
x
operator was suggested to consider possible measurement
errors when at least one of the ~m parameter probabilities
a
was less than 50% or when the EFW probability, ~max ; was
a
less than 50%. In this case, the operator decided to make
new ultrasound measurements or to keep the current
measurements, depending on his/her clinical experience
and on case-specific clinical information.
Improvements of accuracy in EFW were assessed by
applying our interactive method on-line to the 61 abovementioned pregnant women in the last week before delivery. We calculated mean and maximum AE% (MAE% and
AEmax%) and the percentage of FW having AE% greater
than 10% (AEgt10%).
The effectiveness of measurement error correction was
also evaluated using some mathematical models from the
literature [3, 14, 22, 25, 33, 35, 44] proven to give performance equivalent to our model by error comparison
using the non parametric statistical test of Wilcoxon [1].
ð8Þ
jEFWi À BWi j
AEi ¼
BWi
3 Results
where AEi is the relative absolute error of the model in
predicting the i-th fetal weight.
3.1 Model estimation of fetal weight
2.6 Clinical evaluation of model performance
Our method for real-time control of fetal echobiometry was
then tested for its effective ability to detect and correct
measurement errors and therefore improve accuracy in
EFW.
Ultrasound parameters of 61 fetuses were evaluated
within 5 days of delivery in the Department of Pediatrics,
Obstetrics and Reproductive Medicine, University of Siena, by real-time interaction with our multinormal model,
implemented numerically by software developed in Matlab
language [19].
To investigate whether the system was able to appropriately correct measurement errors difficult to detect and
to significantly improve the accuracy of EFW, an obstetrician with good experience in ultrasound (at least 2 years
Model performance was statistically equivalent for the
training, validation and testing data sets (Wilcoxon test,
p [ 0.05). We therefore report the results for the entire
data set used for model design. Figure 2 shows the distribution of percentage error in relation to birth weight for the
multinormal probability model and the seven models which
gave statistically equivalent performance on the 61 data
items used for evaluating our model in real-time clinical
practice. Table 1 gives the MAE% and the percentage of
cases with AE% greater than 10% (AEgt10%) for each
model. As we can see (Fig. 2), only our proposed multinormal model, by virtue of its probability nature, has
uniform non-biased behaviour over the whole range of
BW. On the contrary, all the other models based on
regression techniques have an error distribution strongly
influenced by training data density in BW space, with the
only exception being the Hadlock model, which has
moderate bias because it was trained on a data set having a
123
6. 114
Med Bio Eng Comput (2008) 46:109–120
Fig. 2 Distribution of
percentage error in relation to
birth weight in our multinormal
model and the other seven
models selected to give
statistically equivalent
performance with our clinical
data
quite uniform BW distribution [22]. Table 1 shows that this
model had errors very similar (MAE% = 7.81, AEgt10% =
30.8%) to our model (MAE% = 7.86, AEgt10% = 31.3%).
In particular, Fig. 2 shows that the Ott [33], Combs [14],
Woo [44] and Robson [35] models overestimate low BWs
and underestimate high BWs, whereas the Hill [25] and
Benson [3] models have different biases, underestimating
low and high BWs and overestimating intermediate BWs.
123
The lowest performances in Table 1 are shown by models
particularly biased at high BWs. Cases with high errors
generally also had low probabilities associated with our
model EFW, presumably due to ultrasound measurement
errors. Probability region boundaries with low probability
values are therefore an inspection area in which measurement errors should be checked and where the accuracy of
EFW could improve.
7. Med Bio Eng Comput (2008) 46:109–120
115
Table 1 Model performance evaluated on the whole set of data
(training, validation and testing sets) used to design the multinormal
model
MAE%
AEgt10%
Multinormal
7.86
31.3
Ott
7.45
27.2
Combs
Hill
8.43
8.00
33.1
29.7
Woo
7.53
28.7
Benson
8.43
32.6
Hadlock
7.81
30.8
Robson
7.74
30.0
Model
Mean absolute percentage error MAE%; percentage of fetuses estimated to have an AE% greater than 10% AEgt10%
A prototypical numerical implementation of our model
is shown in Fig. 3 that reports the screen hard copy of
graphical user interface of the underlying software. In the
right side of Fig. 3 we have gestational information ð~ Þ;
w
actual measurements ð~Þ; probabilities of congruence
x
among them ð~m Þ and their model-estimated expected
a
values ð~Þ: The lower the probability of parameter conl
gruence, the more suspect that parameter has to be
considered. High deviation ð~m Þ from expected values may
d
be due to measurement errors. Excessively low probability
values or low values of more than one parameter suggest
that the ultrasound session should be repeated. Figure 3
(left side) shows the five plot windows of most probable
parameter values (black lines) and standard deviations
(light blue lines) in relation to BW, as estimated from
ANNs. Dots around curves represent training data. On the
top of the graphic windows are the EFW (BWmp) and its
multivariate probability ð~max Þ: Again, the lower this
a
probability, the more high measurement errors, or unusual
body conformation, or both, can be expected. When ~max is
a
particularly low, at least one of the congruency probabilities ~m is low as well. Dashed blue lines underline both
a
EFW (BWmp, vertical lines) and its corresponding modelestimated expected parameter values ð~; horizontal lines).
l
At the bottom of each plotting area, the univariate expected
EFWs (BWexp, vertical dashed red lines) are reported with
the measured parameter values ð~; horizontal dashed red
x
lines). The multivariate most probable EFW, BWmp, is of
course between the minimum and maximum of five univariate BWexp values.
Figure 3 shows an example of EFW by our system. It
concerned a fetus at 40 weeks. The system indicates that
measured head circumference (HC = 350 mm) has a low
probability (10%) of being congruent with respect to other
fetal biometric parameters and an EFW of 3,331 g (probability 13%). This could mean: (1) that the HC
measurement is incorrect and that it needs to be measured
again; (2) that fetal HC is correct but is bigger than
expected because of hereditary predisposition; (3) that HC
is bigger for pathological reasons. Only the operator
experience, if necessary with other clinical information,
can answer this question.
Fig. 3 Graphic user-interface
of interactive software for fetal
echobiometry control and
correction, to improve EFW
accuracy
123
8. 116
Med Bio Eng Comput (2008) 46:109–120
3.2 Clinical evaluation of model performance
experienced operator. After correction (excepting two
models), the percentage of cases with an error above 10%
reduced to zero, as shown in Table 2. Maximum error was
lower or just a little higher than 10%.
In 16 out of 61 cases (26.3%) fetal biometry was measured
once and in 45 cases it was repeated two or more times, to a
total of 153 measurements. System performance was
assessed by comparing its 61 initial FW estimates with
those obtained without (16 cases) or with one (3 cases) or
more (42 cases) re-measurements of ultrasound parameters
associated with low (less than 50%) congruence probabilities. For comparison we used EFW, derived from 182
formulas (from 59 published papers) [17]. Considering the
61 initial estimates, seven formulas [3, 14, 22, 25, 33, 35,
44] showed a performance statistically equivalent to our
system (Wilcoxon test, P [ 0.05). All other formulas gave
significant higher errors. Table 2 shows the performances
of all models. It is evident that correction of detected errors
yielded statistically significant improvements not only in
our model EFW (MAE% from 6.5% to 2.6%) but also when
the new biometry was tested by the seven best models (i.e.
Hadlock formula MAE% from 6.7% to 3.5%), thus confirming that the system is able to correct measurement
errors that affect model performance, worsening their
accuracy.
In particular, although the Hadlock model showed the
second best decrease in MAE% after our system, we found
a drastic reduction in error variability, with a maximum
error of 9.0% (in the same fetus), lower than that made by
our system (maximum error of 10.7%). Nevertheless, this
maximum error of 10.7% is acceptable, because it concerns
a normal weight fetus (real weight 3,640 g) that was
underestimated by the system (EFW equal to 3,250 g).
Other models also showed very good performance with
few errors above 10%.
In the cases we analyzed, MAE% was low at initial
estimations because the measurements were made by an
4 Discussion
Accurate prediction of BW by ultrasonographic measurement of classical fetal morphometric parameters plus other
related pregnancy data, such as gestational age, amniotic
fluid volume and number of fetuses, is of considerable
interest in obstetrics, enabling clinicians to more accurately
predict infant morbidity and mortality [17]. Moreover,
EFW in utero is of great clinical interest for monitoring
fetal growth [31, 34] and may have a central role in major
medical decisions in critical conditions of preterm delivery
and fetal macrosomy [15, 20, 35, 36].
Although a lot of sophisticated mathematical formulas
and models have been developed in the last 30 years [3, 11,
14–17, 20, 22, 23, 25, 27, 29, 33, 35, 36, 38, 41, 44],
estimates still typically have too high an error variance,
preventing reliable clinical use [13, 15, 17, 29]. Even
operators with proven ability in ultrasound examination
provide remarkably high percentages (15–25%) of fetuses
whose BW is estimated with an AE% greater than 10%.
This problem seems difficult to overcome because the
many errors of fetal ultrasound evaluation are presumably
due to technological, environmental, intra- and interobserver variability in fetal measurement and so forth [17,
29]. There are currently unlikely to be major revolutions in
technology, ultrasonographic practice and other methods
that could significantly improve accuracy of measurements
and/or their ability to predict BW more reliably. At the
moment, it is not at all easy to quantify errors, and
Table 2 Model performance evaluated in 61 pregnancies before (initial measurements) and after (ultimate measurements) zero (16 cases), one
(3 cases), or more corrections (42 cases) of the initial ultrasound measurements
Model
Initial measurements
Ultimate measurements
MAE%
AEmax%
AEgt10%
MAE%
AEmax%
AEgt10%
Multinormal
6.5
19.3
13.1
2.6
10.7
1.6
Ott
5.7
16.9
9.8
4.3
9.6
0.0
Combs
Hill
5.8
6.2
19.1
18.6
11.5
13.1
4.2
4.6
12.2
10.2
1.6
1.6
Woo
6.6
20.1
18.0
4.6
9.8
0.0
Benson
6.7
19.1
16.4
4.9
13.6
3.3
Hadlock
6.7
18.1
16.4
3.5
9.0
0.0
Robson
6.7
16.8
16.4
5.4
14.7
9.8
Corrections were decided autonomously by the operator using an interactive system based on the proposed multinormal model for fetal weight
estimation: absolute percentage AE%; mean absolute percentage error MAE%; maximum absolute percentage error AEmax%; percentage of
fetuses estimated to have AE% greater than 10% AEgt10%
123
9. Med Bio Eng Comput (2008) 46:109–120
particularly to discriminate errors due to intra- and interobserver variability in ultrasound measurements. Efforts
must be made to minimise this variability if EFW is to be
considered clinically useful [17].
Many recent attempts have been made to reduce the
estimation error on lower and higher FWs, where the
clinical interest is of course focused. In general, clinicians
distinguish these two critical intervals of weight from an
intermediate one that typically ranges from 2,500 to
4,000 g [16, 20, 23]. Almost all models for EFW exhibit a
worsening of accuracy in critical weight classes (below
2,500 g and above 4,000 g) where lower/higher weights
are usually over/under-estimated [13, 16, 29]. Most mathematical models are derived from statistical regressions
and account nonlinearly for ultrasound measurements by
fitting experimental data. They are therefore most accurate
for intermediate weights, where experimental data has
higher density, and produce increasing biases going from
median to lower or higher FWs where data density progressively decreases. Concerning this problem it is really
important to underline that it is in the critical weight
classes that weight estimation becomes fundamental from a
clinical point of view. A dangerous increase of the rate of
false normal weights arises. In other words, such biased
models tend to reassure excessively about a normal FW,
correctly identifying only very critical conditions that can
be detected by simple qualitative investigations.
Models specialized in critical weight ranges have also
been constructed and tested: they are sometimes much
more accurate in the range where they have been fitted and
dramatically less accurate elsewhere, as would be expected
[15, 17, 23, 29, 35, 36, 38, 41]. The use of these specialized
models therefore requires prior knowledge about the
weight range in which to classify the fetus, leading to
dangerous amplification of errors in borderline areas which
are of critical clinical interest. This has also legal implications for ultrasonographers who may make gross errors
with severe consequences for maternal and fetal health.
Moreover, there have been several studies to evaluate
the efficacy of mathematical models related to specific GA
intervals [32, 41]. Although GA intervals are better defined
than weight intervals, they are nevertheless affected by
gestational age estimation precision, that becomes less
accurate as pregnancy goes on, and it is only partially
related to microsomic and macrosomic fetuses.
In our opinion, the use of mathematical models specialized for specific FW and/or GA ranges can therefore be
dangerous, of little clinical interest and not significantly
better than those applicable to the entire fetal population. In
other words, they are of no help.
All other efforts to decrease AE% by introducing correction factors in the algorithms and new information, such
as amniotic fluid volume, number of fetuses and maternal
117
pathologies, or non-routine echobiometric parameters, have
failed to bring effective improvements [8]. Moreover, more
recent mathematical models, besides the above mentioned
limits, are sometimes based on echobiometric parameters
difficult to obtain, particularly by unskilled operators [8,
37, 40]. Specifically, three-dimensional (3D) ultrasound
enables volumetric parameters such as fetal thigh, upper
arm and abdomen to be measured for EFW. Although
preliminary studies seems to indicate improvements [40],
doubts remain about the utility of 3D for a substantial
improvement in the accuracy of EFW [17]. Moreover, 3D
ultrasound systems are expensive, not as widespread as 2D
systems, and unfamiliar for operators doing routine fetal
biometry. In any case, if the superiority of 3D ultrasound
systems were established, our model could be easily
extended to volumetric measurements.
Today, about ten models are considered to give the best,
not significantly different performances and none give a
MAE% below 7–8% [15, 17, 29].
We chose to tackle the problem of reducing human error
in the use of ultrasound devices for fetal biometry by significantly improving the accuracy of EFW. An interesting
attempt to control ultrasound measurement errors by
enhancing the fetal border and reducing noise was recently
proposed for evaluation of nuchal translucency thickness
[30]. Its impact on fetal echobiometry for improving the
accuracy of EFW should be investigated.
We designed a weight-dependent Gaussian probability
model [1, 28] over the whole range of BWs, which avoids
the above-mentioned biases and provides detailed information about the reliability of measurements through
interactive software, allowing redefinition of measurements
and real-time correction. Model parameters were estimated
from a large database of 3,000 fetuses, collected by ultrasound operators of proven experience, though presumably
containing measurement errors. Our hypothesis was that by
correcting or limiting these errors, we could obtain an EFW
of acceptable accuracy to protect fetal and maternal health
and reduce wrong medical decisions, which sometimes also
have legal implications.
In line with Dudley [17], we consider that insufficient
accuracy in EFW depends on excessive intra- and interobserver variability of measurements. The great advantage
of using a multivariate Gaussian model is that it assigns
probability values to the different ultrasound measurements
and to EFW. The model is designed and trained on ultrasound data measured by experienced ultrasound operators
who carefully followed the standardised protocols for
correct echobiometry [4]. It can therefore guide operators
to follow its reliable statistical representation suggesting
repetition of divergent readings to reduce errors. We
assumed that human errors occur more frequently in the
space of ultrasound measurements where the model
123
10. 118
indicates lower probabilities of congruence among biometric parameters. However, low probabilities can also
arise from fetal pathology or peculiar morphology, such as
maternal diabetes, unusual parental build and abnormal
fetal growth. Though these zones may not be distinguished
by ultrasound examination alone, they are both of great
clinical interest. Thus, when operators encounter low
model probabilities, they are alerted to investigate more
thoroughly than usual and to repeat suggested biometric
measurements. Two distinct situations are possible so that
new measurements can be: (1) the same as before and/or
still associated with low probabilities; (2) substantially
different but in the direction of model expected values,
increasing the probability of congruence with other fetal
parameters. In the first case, there may be abnormalities
suggesting the need to review other clinical data, such as
maternal/paternal build and pathologies. In the second
case, measurement errors may be detected and corrected. In
both situations, at least a third session of measurements is
recommended for confirmation. If any disagreement still
remains between measurement sessions, operators should
decide on the basis of other clinical information and/or
experience.
Since our method incorporated certain clinical information about pregnancy, it was convenient to use an ANN
approach [24] to estimate multinormal model parameters
(i.e. mean vectors and covariance matrices), that were
made to depend on pregnancy data and FW. The model
dependence on pregnancy information gives a more accurate probability but makes the problem of estimating its
parameters from sample data unfeasible with common
statistical methods, such as multivariate regression, which
would be inaccurate. For example, means of the parameter
vector could be estimated by entering pregnancy variables
in multivariate linear regression models where echobiometric measurements are assumed as dependent variables.
Unfortunately, all regression techniques are very sensitive
to empty regions in observation space and to outliers [1, 5,
6, 12, 28], and are most accurate where observations are
densest. Since in clinical application there is great interest
in regions with low data density, e.g. macrosomic and
microsomic fetuses, we choose an ANN approach to
overcome the many limits of regression technique [6, 24,
26]. ANNs are sophisticated machine learning methods
which make it possible to express the knowledge contained
in experimental data with great flexibility and precision,
and provide a uniform description, without discontinuities,
of the input-output relationship. They can therefore determine expected output values with satisfactory accuracy, by
interpolating missing data even in multivariate space with
few sparse observations [6]. Other important advantages of
ANNs with respect to statistical regression models are that
it is not necessary to specify model structure, hypotheses
123
Med Bio Eng Comput (2008) 46:109–120
about statistical data distribution are unnecessary, they are
able to describe nonlinearities, naturally take correlation of
input variables into account and can be trained with
examples like humans [24, 26, 39]. ANNs have recently
been successfully applied in many fields of medicine. All
that is required is a sufficiently large, representative set of
training examples. The main difficulty with ANNs is their
training which must be done with care to avoid overfitting,
a tendency of ANNs to learn even training data variability
which cannot be generalized to the whole phenomenon.
There are many methods of ensuring ANN generalization
power, for example regularization techniques, growing and
pruning algorithms, genetic algorithms and early-stopping
(ES) procedures [6, 26]. We applied the ES which is widely
used to train ANNs by virtue of its fast computational time
[6, 24]. It divides the available data into training and validation sets. Generalization is ensured by stopping the
training process at the iteration when the ANN begins to
overfit, that is when the error computed on the validation
set starts to increase. However, since the validation set is
involved in the training process in any case, it must not be
used for estimating the generalization error. We therefore
tested the ANNs with the third set of data (testing set)
which had not been used during training [6].
When we tested our model in clinical practice to correct
operator measurement errors in real time, we obtained very
encouraging results. Fetal biometric measurements were
performed by an experienced operator because we wanted
to understand whether under optimum conditions, it was
possible to obtain errors below 10%. We were successful in
this endeavour.
The fact that we obtained a significant lowering of
MAE% when we fitted the corrected parameters in the best
estimation models of the literature, confirms that our system can in fact help operators to correct measurement
errors. The system also promises to be useful for training
less experienced sonographers and could be used as a
quality control system for fetal biometry. By reducing
human error, it enhances EFW and clinical obstetric
management.
5 Conclusions
A multinormal probability model for the estimation of fetal
weight was implemented numerically to provide clinical
indications about the type and size of measurement errors
in real-time fetal echobiometry. The model compared
actual measures with expected values and associated
probability values with EFW, indicating the reliability of
EFW in terms of congruence with ultrasound measurements. Low probabilities suggest more accurate repetition
of suspect measurements and help ultrasound operators to
11. Med Bio Eng Comput (2008) 46:109–120
interpret fetal morphology by distinguishing between
measurement errors and real pathophysiological
conditions.
Compared to other EFW models of equivalent accuracy,
probability models also have the major clinical advantage
of avoiding over- and under-estimation of micro- and
macrosomic fetal weights.
Clinical testing of the model on a sample of 61 fetuses
revealed its good performance in correcting measurement
errors and showed a remarkable improvement in accuracy of
EFW, confirmed by other mathematical models of proven
accuracy. Our proposed interactive software therefore offers
valid support for training operators in fetal echobiometry.
Although system capacity clearly needs to be tested on a
wider scale, its clinical utility and simplicity, as well as the
sharp improvement in accuracy of EFW, suggest that it
could be used as a reliable auxiliary for clinical decision
making in pregnancy. This is also an advance in the direction of standardization of measuring procedures, which are
often a severe limiting factor in ultrasonographic practice.
119
11.
12.
13.
14.
15.
16.
17.
18.
Acknowledgments This work was financed by the Italian Ministry of
Education, University and Research (MIUR). Special thanks to ESAOTE S.p.A., Genoa, Italy, for its precious and prompt technical support.
19.
References
21.
1. Armitage P, Berry G (1987) Statistical methods in medical
research. Blackwell, Oxford
2. Ben-Haroush A, Yogev Y, Hod M (2004) Fetal weight estimation
in diabetic pregnancies and suspected fetal macrosomia. J Perinat
Med 32(2):113–121
3. Benson CB, Doubilet PM, Saltzman DH (1987) Sonographic
determination of fetal weights in diabetic pregnancies. Am J
Obstet Gynecol 156(2):441–444
4. Bettelheim D, Deutinger J, Bernaschek (1997) Fetal sonographic
biometry: a guide to normal and abnormal measurements. The
Parthenon Publishing Group
5. Biagioli B, Scolletta S, Cevenini G, Barbini E, Giomarelli P,
Barbini P (2006) A multivariate Bayesian model for assessing
morbidity after coronary artery surgery. Crit Care 10(3):R94. doi:
10.1186/cc4951
6. Bishop HCM (1995) Neural networks for pattern recognition.
Clarendon, Oxford
7. Chauhan SP, Hendrix NW, Magann EF, Morrison JC, Kenney
SP, Devoe LD (1998) Limitations of clinical and sonographic
estimates of birth weight: experience with 1034 parturients.
Obstet Gynecol 91(1):72–77
8. Chauhan SP, West DJ, Scardo JA, Boyd JM, Joiner J, Hendrix
NW (2000) Antepartum detection of macrosomic fetus: clinical
versus sonographic, including soft-tissue measurements. Obstet
Gynecol 95(5):639–642
9. Chauhan SP, Hendrix NW, Magann EF, Morrison JC, Scardo JA,
Berghella V (2005) A review of sonographic estimate of fetal
weight: vagaries of accuracy. J Matern Fetal Neonatal Med
18(4):211–220
10. Chauhan SP, Cole J, Sanderson M, Magann EF, Scardo JA (2006)
Suspicion of intrauterine growth restriction: use of abdominal
22.
20.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
circumference alone or estimated fetal weight below 10%. J Matern Fetal Neonatal Med 19(9):557–562
Chuang L, Hwang JY, Chang CH, Yu CH, Chang FM (2002)
Ultrasound estimation of fetal weight with the use of computerized artificial neural network model. Ultrasound Med Biol
28(8):991–996
Cohen J, Cohen P, West SG, Aiken LS (2003) Applied multiple
regression: correlation analysis for the behavioral sciences. Erlbaum, London
Colman A, Maharaj D, Hutton J, Tuohy J (2006) Reliability of
ultrasound estimation of fetal weight in term singleton pregnancies. New Zeal Med J 119(1241):U2146
Combs CA, Jaekle RK, Rosenn B, Pope M, Miodovnik M, Siddiqi TA (1993) Sonographic estimation of fetal weight based on a
model of fetal volume. Obstet Gynecol 82(3):365–370
Coomarasamy A, Connock M, Thornton J, Khan KS (2005)
Accuracy of ultrasound biometry in the prediction of macrosomia: a systematic quantitative review. Brit J Obstet Gynaec
112(11):1461–1466
Dudley NJ (1995) Selection of appropriate ultrasound methods
for the estimation of fetal weight. Brit J Radiol 68:385–388
Dudley NJ (2005) A systematic review of the ultrasound estimation of fetal weight. Ultrasound Obstet Gynecol 25(1):80–89
Edwards A, Goff J, Baker L (2001) Accuracy and modifying
factors of the sonographic estimation of fetal weight in a highrisk population. Aust NZ J Obstet Gyn 41(2):187–190
Etter DM, Kuncicky DC, Moore H (2005) Introduction to
MATLAB 7. Prentice Hall, Englewood Cliffs
Farmer RM, Medearis AL, Hirata GI, Platt LD (1992) The use of
a neural network for the ultrasonographic estimation of fetal
weight in the macrosomic fetus. Am J Obstet Gynecol
166(5):1467–1472
Goldberg JD (2004) Routine screening for fetal anomalies:
expectations. Obstet Gynecol Clin North Am 31(1):35–50
Hadlock FP, Harrist RB, Sharman RS, Deter RL, Park SK (1985)
Estimation of fetal weight with the use of head, body, and femur
measurements - a prospective study. Am J Obstet Gynecol
151:333–7
Hadlock FP (1990) Sonographic estimation of fetal age and
weight. Fetal Ultrasound 28(1):39–51
Haykin S (1994) Neural networks: a comprehensive foundation.
Maxwell Macmillian, Canada
Hill LM, Breckle R, Gehrking WC, O’Brien PC (1985) Use of
femur length in estimation of fetal weight. Am J Obstet Gynecol
152:847–852
Jamshidi M (2003) Tools for intelligent control: fuzzy controllers, neural networks and genetic algorithms. Philos Transact A
Math Phys Eng Sci 361(1809):1781–1808
Jordaan HV (1983) Estimation of fetal weight by ultrasound.
J Clin Ultrasound 11(2):59–66
Krzanowski WJ (1988) Principles of multivariate analysis: a
user’s perspective. Clarendon, Oxford
Kurmanavicius J, Burkhardt T, Wisser J, Huch R (2004) Ultrasonographic fetal weight estimation: accuracy of formulas and
accuracy of examiners by birth weight from 500 to 5000 g.
J Perinat Med 32(2):155–161
Lee YB, Kim MJ, Kim MH (2007) Robust border enhancement
and detection for measurement of fetal nuchal translucency in
ultrasound images. Med Biol Eng Comput (Spec issue). doi:
10.1007/s11517-007-0225-7
Lockwood CJ, Weiner S (1986) Assessment of fetal growth. Clin
Perinatol 13(1):3–35
Mongelli M, Biswas A (2002) Menstrual age-dependent
systematic error in sonographic fetal weight estimation: a mathematical model. J Clin Ultrasound 30(3):139–44
123
12. 120
33. Ott WJ, Doyle S, Flamm S, Wittman J (1986) Accurate ultrasonic
estimation of fetal weight. Prospective analysis of a new ultrasonic formula. Am J Perinatol 3(4):307–10
34. Ott WJ (2006) Sonographic diagnosis of fetal growth restriction.
Clin Obstet Gynecol 49(2):295–307
35. Robson SC, Gallivan S, Walkinshaw SA, Vaughan J, Rodeck CH
(1993) Ultrasonic estimation of fetal weight: use of targeted
formulas in small for gestational age fetuses. Obstet Gynecol
82(3):359–364
36. Rosati P, Exacoustos C, Caruso A, and Mancuso S (1992)
Ultrasound diagnosis of fetal macrosomia. Ultrasound Obstet
Gynecol 2(1):23–29
37. Rotmensch S, Celentano C, Liberati M, Malinger G, Sadan O,
Bellati U, Glezerman M (1999) Screening efficacy of the subcutaneous tissue width/femur length ratio for fetal macrosomia in
the non-diabetic pregnancy. Ultrasound Obstet Gynecol
13(5):340–344
38. Sabbagha RE, Minogue J, Tamura RK, Hungerford SA (1989)
Estimation of birth weight by use of ultrasonographic formulas
targeted to LGA, AGA, and SGA fetuses. Am J Obstet Gynecol
160:854–862
39. Sargent DJ (2001) Comparison of artificial neural networks with
other statistical approaches: results from medical data sets.
Cancer 91(S8):1636–1642
123
Med Bio Eng Comput (2008) 46:109–120
40. Schild RL, Fimmers R, Hansmann M (2000) Fetal weight estimation by three-dimensional ultrasound. Ultrasound Obstet
Gynecol 16(5):445–452
41. Secher NJ, Djursing H, Hansen PK, Lenstrup C, Sindberg-Eriksen P, Thomsen BL, Keiding N (1987) Estimation of fetal weight
in the third trimester by ultrasound. Eur J Obstet Gynecol Reprod
Biol 24:1–11
42. Sladkevicius P, Saltvedt S, Almstrom H, Kublickas M, Grunewald C, Valentin L (2005) Ultrasound dating at 12–14 weeks of
gestation. A prospective cross-validation of established dating
formulae in in vitro fertilized pregnancies. Ultrasound Obstet
Gynecol 26(5):504–511
43. Thornton JG, Hornbuckle J, Vail A, Spiegelhalter DJ, Levene M,
GRIT study group (2004) Infant wellbeing at 2 years of age in the
growth restriction intervention trial (GRIT): multicentred randomised controlled trial. Lancet 364(9433):513–520
44. Woo JS, Wan MC (1986) An evaluation of fetal weight prediction using a simple equation containing the fetal femur length.
J Ultrasound Med 5(8):453–457