1. 1CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
Ernesto Lopez Fune
ORT-France:24 Rue Erlanger, Paris 75016, France
July , 2020
Machine and Deep
Learning for
Cybersecurity and
Finance
28/07/2020
2. 2CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
Outline:
28/07/2020
1) Introductionto Predictive Analytics
• Business models needing Predictive Analytics
• Data challenges, exploration and cleaning
• Data analytics
• Preparing the data for further analysis
2) How Machine Learning can help your business
• Machine Learning common models used in Finances
• Supervised, Unsupervised
• Most common algorithms
• Deep Learning
3) Application examples of PA to Financial Institutions
• Cyber-security
• Credit Card Fraud
• FINSEC Use Case: HDI
3. 3CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
Outline:
28/07/2020
1) Introductionto Predictive Analytics
• Business models needing Predictive Analytics
• Data challenges, exploration and cleaning
• Data analytics
• Preparing the data for further analysis
2) How Machine Learning can help your business
• Machine Learning common models used in Finances
• Supervised, Unsupervised
• Most common algorithms
• Deep Learning
3) Application examples of PA to Financial Institutions
• Cyber-security
• Credit Card Fraud
• FINSEC Use Case: HDI
4. 4CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Predictive Analytics
Predictive Analytics is a powerfulnew approach that uses Data Mining, Statistics and Machine Learning to
identify, based on historicaldata, the likelihood of future incidents beforethey impact customers and end users.
By using PA, ITand Financial organizations can deliver seamless customer experiences that meet changing
customer behavior and business demands.
Largespectrumof applications:
• Automotive:uses PA to analyzesensor data fromconnected vehicles and to build driver assistancealgorithms
• Aerospace: uses PA to predict subsystemperformancefor oil, fuel, liftoff, mechanical health, and controls
• Energy: forecasting electricity price and demand based on historicaltrends, seasonality and weather
• Finances: use machine learning techniques and quantitative tools to predict credit risk and frauds
• Medical:used to spot asthma and COPD on patients' breathing sounds to provideinstant feedback
5. 5CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Business models needing Predictive Analytics
Detecting fraud:combinemultiple analytics methods to improvepattern detection and preventcriminal
behavior
Cybersecurity: to use high-performancebehavioralanalytics to examine all actions on a network in real time to
spotabnormalities that may indicate fraud, zero-day vulnerabilities and advanced persistentthreats
Optimizing marketing campaigns:to determine customer responses or purchases, to promotecross-sell
opportunities and to help businesses attract, retain and grow their mostprofitable customers
Improving operations:to forecastinventory and manageresources. Airlines usePA to set ticket prices. Hotels try
to predict the number of guests for any given night, to maximize occupancy and increase revenue
Reducing risk:credit scores areused to assess a buyer’s likelihood of default for purchases and for insurance
claims and collections
6. 6CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Datachallenges
Regulatory requirements: personaldata is under stringentregulatory requirements
Data security: hackers + advanced threats, data governancemeasures arecrucial to mitigate risks associated
with the financial services industry
Data quality: finance companies wantto do more than juststoretheir data, they wantto use it
Data silos: financial data comes frommany sources like employee documents, emails, enterprise
applications, and more
DATA IS EVERYWHERE
Four big datachallenges in finance
7. 7CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Datachallenges
Combined with business models, this data provides enterprises with opportunities to gain additional insight and value
DATA IS EVERYWHERE
texts, images audio, videos
Financial services companies need to capture it all
Customer
information
Financial
transactions
Product and
service purchase
histories
Customer
journeys
Marketing
campaigns
Service
inquiries Market
feeds
Social
media+IoT
streams
Software
logs
Emails+SMS+
newer sources
9. 9CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
Outline:
28/07/2020
1) Introductionto Predictive Analytics
• Business models needing Predictive Analytics
• Data challenges, exploration and cleaning
• Data analytics
• Preparing the data for further analysis
2) How Machine Learning canhelpyour business
• Machine Learning common modelsused in Finances
• Supervised, Unsupervised
• Most common algorithms
• Deep Learning
3) Application examples of PA to Financial Institutions
• Cyber-security
• Credit Card Fraud
• FINSEC Use Case: HDI
10. 10CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Machine Learning
• A branch of AI based on the idea that systems can learn fromdata, identify
patterns and make decisions with minimal human intervention
• A method of data analysis that automates analytical model building
Applicationsof MachineLearning
• Web Search: ranking pagebased on whatyou are mostlikely to click on
• Finance: evaluation of risk on credit offers, decision making, creadit card frauds
• E-commerce: predicting customer churn
• Space exploration: spaceprobes and radio astronomy
• Robotics: how to handle uncertainty in new environments, self-driving cars
• Informationextraction: ask questions over databases across theweb
Key elements
• Representation: how to representknowledge, i.e. decision trees, sets of rules, instances, etc.
• Evaluationor metrics: the way to evaluate candidate programs, i.e. accuracy, prediction and recall, etc.
• Optimization: theway candidateprograms aregenerated, i.e. combinatorialoptimization, convexoptimization, etc.
11. 11CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Machine Learning
Types of Learning:
• SupervisedLearning: wherethetraining data includes desired outputs; the
algorithms can recognizewhatis whatsince they are trained with this
information
• Unsupervised learning: when thetraining data does not include desired
outputs; it is hard to tell whatis good learning and what is not as we don't
haveany clue of whatis what
• Semi-supervisedlearning: when thetraining data includes only a few
desired outputs; you train, you predict, then add the new predictions to
train and then predict the rest, and continue like this till achieved the degree
of accuracy you needed
12. 12CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Machine Learning
Most common MachineLearning algorithms:
• Supervisedlearning:
• KNN
• Logistic Regression
• Support Vector Machines
• Random Forest
13. 13CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Machine Learning
Most common MachineLearning algorithms:
• Unsupervisedlearning:
• KMeans
• MeanShift
• PCA
• GaussianMixture
14. 14CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Machine Learning
Deep Learning: it is MachineLearning on "steroids"
• It is part of a broader familyof ML methods based on
Artificial NeuralNetworks
• It can be used for supervised, semi-supervised or
unsupervised learning
• Most common DL architectures:
• Deep Neural Networks (ANN, LSTM)
• Deep Belief Networks (DBN)
• Recurrent NeuralNetworks (RNN)
• Convolutional NeuralNetworks (CNN)
15. 15CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Machine Learning
Deep Learning: it is MachineLearning on "steroids"
• Applicationsto:
• Computer & Machine Vision
• Speech & Audio Recognition
• Natural LanguageProcessing
• Social Network Filtering,
• Machine Translation
• Bioinformatics and Drug Design
• Medical ImageAnalysis
• The produced results are comparableto
and in some cases surpassing human
expert performance.
16. 16CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
Outline:
28/07/2020
1) Introductionto Predictive Analytics
• Business models needing Predictive Analytics
• Data challenges, exploration and cleaning
• Data analytics
• Preparing the data for further analysis
2) How Machine Learning can help your business
• Machine Learning common models used in Finances
• Supervised, Unsupervised
• Most common algorithms
• Deep Learning
3) Applicationexamples of PA toFinancial Institutions
• Cyber-security
• Credit Card Fraud
• FINSEC Use Case: HDI
17. 17CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
Outline:
28/07/2020
1) Introductionto Predictive Analytics
• Business models needing Predictive Analytics
• Data challenges, exploration and cleaning
• Data analytics
• Preparing the data for further analysis
2) How Machine Learning can help your business
• Machine Learning common models used in Finances
• Supervised, Unsupervised
• Most common algorithms
• Deep Learning
3) Application examples of PA to Financial Institutions
• Cyber-security
• Credit Card Fraud
• FINSEC Use Case: HDI
18. 18CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Amazon.com - Employee Access
Use case description: When an employee at a company starts to work, they firstneed to obtain the
computer access necessary to fulfill their role. This access may allow an employee to read/manipulate
resources through various applications or web portals. Itis assumed that employees fulfilling the functions
of a given role will access the sameor similar resources.
Use case requirements: To build a model, learned using historicaldata, that will determine an employee's
access needs, such that manual access transactions (grants and revokes) areminimized as the employee's
attributes changeover time. The model will take an employee's roleinformation and a resourcecodeand
will return whether access should be granted.
19. 19CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Amazon.com - Employee Access
Exploratory analysis:
• Loading the data
• No Missing Values
• No unformatted data
• Unbalanced data
• Scale and transformthedata for ML:
• Standard Scaler
20. 20CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Amazon.com - Employee Access
Exploratory analysis:
• Detecting outliers (Anomalies) with the
statistical method of the IQR:
• Outliers = 1
• Bad for training ML
• 51.33% of thetotal
• Inliers = 0
• Good for training ML
• 48.67% of thetotal
• Principal ComponentAnalysis
• No clear decision boundary
• SVMand kNN will fail
• Too many outliers
• Correlations:
• No visible correlations in features
21. 21CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Amazon.com - Employee Access
Predictions using Outlier/Anomaly Detection:
• Outliers help to spotabnormal behavior
• Outliers could be attacks, if known whatattack are
Results fromthe ML_DL_Toolbox:
• Toolbox trained with the inliers only and to predict all
• Algorithm: CatBoostClassifier, RMSE/Accuracy=0.0001
Outlier/Anomaly Detection results:
• Accuracy goal: 88.46%
• Predicts better Granted Access than Revoked
• 15000 FalsePositives
• 1100 FalseNegatives
Supervised Learning results:
• Accuracy goal: 94.36%
• Predicts equally Granted and Revoked Access
• 64 FalsePositives
• 1600 FalseNegatives
22. 22CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Fingerprint recognition
Use case description: MNIST("Modified NationalInstituteof Standards and Technology") is the de facto
“hello world” datasetof computer vision. Since its release in 1999, this classic datasetof images has served
as the basis for benchmarking classification algorithms. As new Machine Learning techniques emerge,
MNISTremains a reliable resourcefor researchers and learners alike.
Due to privacy policies, this use case will be implemented with hand-written text images, although the
extension to real fingerprints is straightforward.
Use case requirements: To build a Machine Learning model using historicaldata, that correctly identifies
digits froma dataset of tens of thousands of handwritten images.
23. 23CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
MINST – Handwritten Digit Recognizer
Exploratory analysis:
• Loading the data: pixel values
• Multi-class labels: 0, 1, 2, 3....9
• Clean data
• No missing values
• Data Visualization
• Balanced data
Parallel with fingerprints:
• Label = User credentials
• Pixel values = Pixel values
24. 24CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
MINST – Handwritten Digit Recognizer
CatBoost classifier:
• Accuracy goal: 98.70%
• Precision: almost nearly equal precision for all digits
• FalsePositives and FalseNegatives present, which is not
good when talking about security
• Predictions: On averagequite good, but can be improved
25. 25CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
MINST – Handwritten Digit Recognizer
Convolution Neural Networks:
• One hidden layer with 150 artificial neurons
• The output layer has 10 neurons, one for each digit
• Convolution is used to detect edges
• Pooling is used to reduce the dimensions of the data
26. 26CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
MINST – Handwritten Digit Recognizer
Convolution Neural Networks:
• Accuracy goal: 100%
• Precision: Perfectprecision for all digits
• No FalsePositives and No FalseNegatives
• Predictions: Perfectclassification
27. 27CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
Outline:
28/07/2020
1) Introductionto Predictive Analytics
• Business models needing Predictive Analytics
• Data challenges, exploration and cleaning
• Data analytics
• Preparing the data for further analysis
2) How Machine Learning can help your business
• Machine Learning common models used in Finances
• Supervised, Unsupervised
• Most common algorithms
• Deep Learning
3) Application examples of PA to Financial Institutions
• Cyber-security
• Credit Card Fraud
• FINSEC use case
28. 28CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Credit Card Fraud
Use Case description: theaim is to predictwhether an online transaction
is fraudulentor not, given a set of data linked to each one of them
• The database is taken froma Kaggle projectcalled: IEEE-CIS FRAUD
DETECTION PROJECT
• The target variable is binary and given by a column named "isFraud"
• There are Numerical and Categorical features
• The data is highly unbalanced: 96.5% arenormaltransactions
• There are 3 duplicate rows
• There are missing values to be filled
29. 29CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Credit Card Fraud
Statistical informationfromthe categorical features:
• In total, the mostboughtproducts wereW, C and R, however,
fraudulenttransactions point that the mostbought products wereC, S
and H
• The most used type of paymentwas by debit card; however, fraudulent
transactions weredone mostly by using credit cards instead
30. 30CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Credit Card Fraud
Statistical informationfromthe categorical features:
• In total, the mostused cards to pay wereVisa and Mastercard, but
mostof the frauds took place with Discover, then equally with
Mastercard and Visa
• The most used verification email accounts wereGmail and Yahoo,
however, mostof the fraudulenttransactions haveverification emails
protonmail.com, mail.comand outlook.es
31. 31CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Credit Card Fraud
Preparationof the datafor Machine Learning:
Filling missing values strategy:
1. If the amount of missing values in a specific column exceeds the 50%,
this column is deleted
2. Only columns with missing data less than 50% arekept:
• Separatefraudulenttransactions fromnormalones:
• If the featureis categorical, then missing values are filled
with the median
• If the featureis numerical, then the missing values are
filled with the mean value
Label encoder to turn to numeric the
categorical features
32. 32CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
Credit Card Fraud
Predictions using Outlier/Anomaly Detection:
• Outliers help to spotabnormal behavior
• Outliers could be frauds, if known whatfrauds are
Results fromthe ML_DL_Toolbox:
• Toolbox trained with the inliers only and to predict all
• Algorithm: AdaBoostClassifier, RMSE/Accuracy=0.0001
Outlier/Anomaly Detection results:
• Accuracy goal: 93.24%
• Predicts better frauds than normal
• 33 FalsePositives
• 0 FalseNegatives
Supervised Learning results:
• Accuracy goal: 99.26%
• Predicts nearly equally frauds and normal
• 3 FalsePositives
• 0 FalseNegatives
33. 33CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
Outline:
28/07/2020
1) Introductionto Predictive Analytics
• Business models needing Predictive Analytics
• Data challenges, exploration and cleaning
• Data analytics
• Preparing the data for further analysis
2) How Machine Learning can help your business
• Machine Learning common models used in Finances
• Supervised, Unsupervised
• Most common algorithms
• Deep Learning
3) Application examples of PA to Financial Institutions
• Cyber-security
• Credit Card Fraud
• FINSEC Use Case: HDI
34. 34CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
FINSEC Use Case: HDI
Use case description: to predict which users do the
same operations when logged-in to the service
• Each user "id_simulation" do severaloperations
in each interval of time
• All these operations are condensed in a frequency
table for a given "id_simulation"
• The goal is to comparein this frequency
table, whichid_simulation did such a
similar procedure
• This is an active topic in time series
forecasting called MotifsDiscovery,
not fully solved yet as it needs more
optimization
35. 35CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
FINSEC Use Case: HDI
• As a firstapproximation we proposed to interpret
all the information of every "id_simulation" as a
point in an n-dimensionalspace, with n the
number of columns in the frequency table
• Then for each "id_simulation" to compute
the Euclidian distance between the given
user and the restof them
• Then to select only those users which
distances fromthe given one are less than
a certain threshold, for example 0.1
• If this threshold is set to 0.0, then weare
selecting those users doing exactly the
same operations
36. 36CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
28/07/2020
FINSEC Use Case: HDI
• For example, within the wholedatabase:
• Users 12175, 12482 and 13363 aredoing similar
operations as 12166
• Users 12168, … , 121172 aredoing similar operations
as 12167
• Users 18149, 22672, 23649 and 23679 aredoing
similar operations as 24375
• And so on.....
37. 37CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
Ernesto Lopez Fune
ORT-France:24 Rue Erlanger, Paris 75016, France
JULY, 2020
For More Information:
Please Register with: Finsecurity.eu
Thank you – Questions?
28/07/2020
38. 38CONFIDENTIALFINSEC PROJECT 786727
Integrated Framework for
Predictive and Collaborative Security
of Financial Infrastructures
Ernesto Lopez Fune
ORT-France:24 Rue Erlanger, Paris 75016, France
JULY, 2020
Machine and Deep
Learning for
Cybersecurity
and Finance
28/07/2020