3. Objectives
• Doctors are interpreting ECG
(Electrocardiography) by evaluating
characteristics of P-Q-R-S-T waves.
• A machine learning model can be
trained to accurately predict heart
disease.
Hypothesis
• Develop a ML model with some unique
parameters to predict rhythm related
heart diseases more accurately
• Our results should be better than
previous studies.
• Prepare a software for easy usage
Goal
4. Introduction - What is ECG?
• For a healthy person, electrical signals
coming from 1 heartbeat can be seen in left
figure.
• The changes in P,Q,R,S,T peaks
Height of peak
Duration of peak
are the alerts for different heart diseases
1
5. Introduction – New Approach
Previous Studies for ECG interpretation
• Neural Networks
• Fourier Transformation
• Gradient Boosting Tree
• Genetic Algorithm
• Polynomial Regression
are used to predict heart disease from ECG
2
Our Study
• Polynomial simulation
(A method developed
by ourselves to force
the polynomial function
passing from certain
points)
• Random Forest
algorithms are used to
predict heart disease
from ECG
3
With our unique method ‘Polynomial Simulation’; the
basic characteristics of ECG, like «height of R peak»,
«QRS interval» are reflected better than others
6. Introduction - DataSet
• 12-lead ECGs of 10,646 patients
• 500 Hz sampling rate
• Each consists of 10-second
• All diseases labeled by professional
experts
• Created under the auspices of Chapman
University and Shaoxing People’s Hospital
4
SB; 3889
SR; 1826
AFIB; 1780
ST; 1568
SVT; 587
AF; 445
SA; 399
AT; 121 AVNRT; 16 AVRT; 8 SAAWR; 7
# of Patients in Dataset
SB Sinus Bradycardia
SR Sinus Rhythm
AFIB Atrial Fibrillation
ST Sinus Tachycardia
AF Atrial Flutter
SA Sinus Atrium
SVT
Supraventricular
Tachycardia
AT Atrial Tachycardia
AVNRT
Atrioventricular
Node Reentrant
Tachycardia
AVRT
Atrioventricular
Reentrant
Tachycardia
SAAWR
Sinus Atrium to
Atrial Wandering
Rhythm
* SR (Sinus Rhythm)
means healthy person
7. Introduction – How to detect disease from ECG?
An AFIB example
2 example ECG; the disease and its characteristics are shown
• Uncertain P peak
• R-R interval is inconsistent
An SB example
• Number heart beat is 40-60 /
minute
8. Introduction – Selected Diseases
• 2 diseases AVNRT & AVRT omitted from evaluation since # of patients are very low
• SA and SAAWR evaluation combined as SA/SAAWR, since # of SAAWR patients are very low
Atrioventricular Node Reentrant
Tachycardia
Atrial Fibrillation
Atrial Tachycardia
Atrioventricular Reentrant Tachycardia
Sinus Tachycardia
Atrial Flutter
Supraventricular Tachycardia
Sinus Bradycardia
Sinus Atrium
Sinus Atrium-Atriyal Rhytm
9. Method
Patient Number Rep. ECG P1 P2 P3 P4 P5 P6 P7 Disease
Patient #1 11 44.83 467.61 1.33 130.17 660.67 1 SR
Patient #2 9 45.29 37.27 1.79 96.57 577.43 0 AFIB
Patient #3 0.941 8 52.43 341.70 1.24 112.71 578.14 1 SB
Patient #10646 0.941 26 30.29 1287.8 1.24 89.86 454.57 0 AF
Unique Parameters
Other Parameters
• 7 parameters are calculated for each ECG and by using the given disease information, random forest
algorithm is tested. A ruleset is created which predicts the correct disease with %98.7 accuracy.
• A software is developed which predicts the heart disease by using this ruleset for a given ECG
1
10. Method- About Polynomial Simulation
We used a specific method
developed by ourselves,
which enables to find a
polynomial function which
passes some certain points
Let’s consider a 4th degree polynomial function
2
• Put x=1; x=2; x=3; x=4; x=5; x=6 for this function
• Write this 6 equation to the base of triangle (left side
figure)
3
• Calculate differences of two consequtive equations
• Write the new equation to top row, and construct this
triangle till reaching 0
4
11. Method- About Polynomial Simulation
Let’s find the suitable 4th degree polynomial
function which passes from the points
(1,1) (2,8) (3,27) (4,64) (5,125)
5
Write this 5 number to the base of triangle ( y values)
(right side figure)
6
• Calculate differences of two consequtive numbers
• Write the new number to top row, and construct this
triangle till reaching 0
7
12. Method- About Polynomial Simulation
Combine these 2 triangles
Use only left column values
24a= 0
60a+6b=6
50a+12b+2c=12
15a+7b+3c+d=7
a+b+c+d+e = 1
So a=0, b=1, c=0; d=0; e= 0
means the polynomial function is
certainly passes from
points which we request
8
13. Method- About Entropy
Entropy Calculation:
Entropy is a mathematical value which
measures the irregularity and uncertainty in a
system. For the coefficients of the each
heartbeat, we calculated the entropy value by
using the formula below. It is called Shannon
Entropy formula. Normalized entropy is the value
which 𝐻 𝑥 value is divided to log 𝑛
9
14. Method- Calculated Parameters
Unique Parameters
• Average QRS length
• First polynomial- actual difference
Polynomial passing from Q peak, mid of Q-R peaks, R peak,
mid of R-S peak, S peak
• Entropy value for coefficients of polynomials
Poynomial passing from each peak
• P wave direction calculated by polynomial
simulation
2nd degree polynomial passing from P peak to detect the direction
of P peak
10
• Average QRS length mainly helps to identify ST, SB, AFIB and SVT
• First polynomial-actual difference mainly helps to identify SA
• Entropy value mainly helps to identify SVT, SA
• P wave direction mainly helps to identify AF, AFIB or AT
15. Method- Calculated Parameters
Parameters used in previous studies
• # of heartbeat
• PR interval length
• R-R interval length
11
• # of heart beat and R-R interval length mainly helps to identify SB, ST, SVT
• PR interval length length mainly helps to identify AFIB
16. Method- About Random Forest
Random Forest Algorithm uses decision
trees. It combines several decision trees to
have a more accurate model. The final
decision tree is constructed by joining each
decision tree estimation.
12
17. Method- Random Forest
Why Random Forest?
• After calculating all parameters for all ECG, 6 different ML algorithm is
tested for AF disease by using Weka with default parameters. Random
Forest gave best results, so it is applied for all diseases.
(10-fold cross validation with %20 testing data)
13
Algorithm Name F1 Score Precision Recall Duration (sec)
Random Forest 0.98 0.98 0.98 0.33
SVM 0.96 0.97 0.96 0.05
ZeroR 0.83 0.83 0.83 0.01
BayesNet 0.97 0.96 0.96 0.03
Logistic Regression 0.97 0.96 0.97 0.05
AdaBoost 0.95 0.94 0.95 0.08
18. Results- Comparison
Our Algorithm Zheng & others (2020)
F1 Score Precision Recall F1 Score Precision Recall
Diseases
AFIB
SB
SA/SAAWR
SR
AT
ST
SVT
AF
Weighted Avg.
0.979 0.979 0.979
0.948 0.950 0.947
0.996 0.996 0.996
0.881 0.882 0.881
0.985 0.984 0.985
0.989 0.990 0.989
0.993 0.993 0.993
0.991 0.988 0.995
0.987 0.986 0.987
0.941 0.938 0.944
0.993 0.99 0.996
0.977 0.972
0.982
0.949 0.953 0.944
0.97 0.971 0.97
Zheng&others(2020)
made a similar study
with same dataset.
When we compare all
diseases with their
study, our result is
better except SA. They
combined SR and SA
in their study since it
is easy to distinguish.
Our weighted average
F1 score is 0.987
1
19. Results- Benefits of parameters
EXCLUDED PARAMETER F! SCORE F! DIFF. PRECISION RECALL
ALL PARAMETERS INCLUDED 0.979 0 0.979 0.979
# of HEART BEAT 0.952 0.025 0.952 0.952
AVERAGE QRS LENGTH 0.946 0.033 0.948 0.945
FIRST POLYNOMIAL-ACTUAL DIF. 0.966 0.013 0.967 0.965
ENTROPY VALUE 0.956 0.023 0.957 0.955
PR INTERVAL LENGTH 0.952 0.027 0.952 0.95
R-R INTERVAL LENGTH 0.952 0.027 0.952 0.95
P WAVE DIRECTION 0.935 0.044 0.937 0.934
ALL UNIQUE PARAMETERS 0.931 0.048 0.930 0.932
ALL OTHER PARAMETERS 0.95 0.029 0.95 0.95
• We removed each parameter, and check the F1 score for AF disease. So, biggest contribution is
coming from P wave direction and our unique parameters seems very effective.
20. Results- User Interface
Load ECG
as a csv file
1
See ECG, you can
zoom and see
details
2
See each heart beat
and peaks (also you
can zoom in-out)
3
Check
polynomial function
passing from each
heartbeat
4
21. Results- User Interface
Press this
buton, to
see
prediction
5
You can see this
window (the possible
disease, and proposals
6
22. Results- Codes
All codes are coded with phyton. The libraries used in this
Project are listed below.
2
Nump Neurokit2
Math Os
Csv Sklearn
Pyqt Matplotlib.pyplot
pandas
All decision trees for each disease combined and coded to
construct a single decision tree (to predict single disease
for each ECG)
3
23. Discussion- Future Work
• Our software will help users to interpret ECG more accurately and correctly.
• In order to improve SA performance, a new parameter can be studied
• Only 1 channel information is used in our study, 12 channel data can be
used for better analysis
• Our polynomial simulation method can be used not only for rhythm related
diseases prediction, but also for other heart diseases like heart attack
estimation.
1
24. References
1) McNamara K., Alzubaidi H., Jackson J.K. Cardiovascular disease as a leading cause of death: how are pharmacists getting involved?
2) Integr. Pharm. Res. Pract., 9 (2021), sf. 1-12
3) Gaidai O., Cao Y., Loginov S. Global cardiovascular diseases death rate prediction Curr. Problems Cardiol. (2023), Article 101622
4) World Health Organization. Cardiovascular diseases. World Health Organization. https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_1
5) World Health Organization. Cardiovascular diseases. World Health Organization. https://www.who.int/news-room/fact-sheets/detail/cancer
6) What is an electrocardiogram (ECG)? - informedhealth.org - NCBI bookshelf. https://www.ncbi.nlm.nih.gov/books/NBK536878/
7) Sinus bradycardia - statpearls - NCBI bookshelf. https://www.ncbi.nlm.nih.gov/books/NBK493201
8) Centers for Disease Control and Prevention. (2022, October 14). Atrial fibrillation. Centers for Disease Control and Prevention.
https://www.cdc.gov/heartdisease/atrial_fibrillation.htm
9)PREETAM, T. V. N. (2020) ECG SIGNAL ANALYSIS AND PREDICTION OF HEART ATTACK WITH THEHELP OF OPTIMIZED NEURAL NETWORK.
10) Zheng, J., Zhang, J., Danioko, S., Yao, H., Guo, H., & Rakovski, C. (2020). A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000
patients. Scientific Data, 7(1). doi:10.1038/s41597-020-0386-x
11) Hangyuan, G. (2019, November 29). A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients. figshare.
https://figshare.com/collections/ChapmanECG/4560497
12) Şahin, M. (2018). ÇOKGENSEL SAYILARLA 3 BOYUTLU TOPLAMSAL YAPILAR, GENELLEŞTİRİLMELERİ VE ÖZELLİKLERİ. 2018 Tübitak 2204-A lise öğrencileri
araştırma projeleri yarışması.
13) Gray, R. M. (2011), Entropy and Information Theory, Springer. sf. 61-65
14) Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324