SlideShare a Scribd company logo
1 of 23
Download to read offline
Presented By: Aayush Srivastava
& Divyank Saxena
Methods of
Optimization in
Machine Learning
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Join the session 5 minutes prior to
the session start time. We start on
time and conclude on time!
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Silent Mode
Keep your mobile devices in silent
mode, feel free to move out of
session in case you need to attend
an urgent call.
Avoid Disturbance
Avoid unwanted chit chat during
the session.
Our Agenda
01 What is Optimization in
Machine Learning
02 What is Gradient Descent
03
What is Minibatch Stochastic
Gradient
04
What is Adam optimization
05
Demo
05
06
What is Stochastic Gradient
Descent
.
What is Optimization in ML
● Optimization in Machine Learning is a technique used to find the best set of parameters for a given
model to minimize a loss function and improve its performance. It is an essential step in the training
process of a machine learning model.
● The goal of optimization is to find the best weights and biases for the model, so that it can make
accurate predictions.
● Optimization is used in machine learning because models typically have many parameters, and finding
the best values for those parameters can be a challenging task.
● With optimization techniques, the model can automatically search for the best parameters, rather than
relying on manual tuning by the user.
.
What is Cost Function
● A cost function is a function which measures the error between predictions and their actual values
across the whole dataset.
● Minimizing the cost function helps the learning algorithm find the optimal set of parameters, such as
weights and biases, that produce the best predictions.
● Cost function is a measure of how wrong the model is in estimating the relationship between X(input)
and Y(output) Parameter
- m is the number of samples
- Sum from i to m,
- The actual calculation is just the hypothesis value for h(x)
minus the actual value of y. Then you square whatever you get.
.
What is Cost Function
● Let’s run through the calculation for best_fit_1.
1.The hypothesis is 0.50. This is the h_the ha(x(i)) part
what we think is the correct value.
2.The actual value for the sample data is 1.00.
So we are left with (0.50 — 1.00)^2 , which is 0.25.
3.Let’s add this result to an array called results and do the same for all three points
4.Results = [0.25, 2.25, 4.00]
5.Finally, we add them all up and multiply by ⅙ .We get the cost for best_fit1 = 1.083
.
What is Cost Function
● COST: best_fit_1: 1.083
best_fit_2: 0.083
best_fit_3: 0.25
● A low costs represents a smaller difference.
.
What is Loss Function
● A loss function, also known objective function, is a mathematical measure of how well a model is able
to make predictions that match the true values.
● A loss function measures the error between a single prediction and the corresponding actual value.
● Loss and cost functions are methods of measuring the error in machine learning predictions. Loss
functions measure the error per observation, whilst cost functions measure the error over all
observations.
Types:
1.Mean Squared Error (MSE): This loss function measures the average squared difference between the
predicted values and the true values.
2.Mean Absolute Error (MAE): This loss function measures the average absolute difference between the
predicted values and the true values.
● Gradient, in plain terms means slope or slant of a surface. So gradient descent literally means
descending a slope to reach the lowest point on that surface
● Gradient descent enables a model to learn the gradient or direction that the model should take in
order to reduce errors (differences between actual y and predicted y).
● This algorithm that tries to find a minimum of a function iteratively
What is Gradient Descent
.
What is Learning Rate
● Learning Rate:
The learning rate is a hyperparameter in machine learning that determines the step size at which the
optimization algorithm updates the model's parameters. It is used to control the speed at which the
model learns.
.
Limitation of Gradient Descent
● Some limitations and drawbacks that can affect its performance and efficiency.
● Local Minima: Gradient Descent can get stuck in a local minimum, which may not be the global
minimum, and therefore, the optimization will not produce the best result.
● Vanishing gradient: When training deep neural networks, the gradients can become very small,
leading to the vanishing gradient problem, which can slow down or prevent convergence.
● Stochastic Gradient Descent (SGD) is a variant of Gradient Descent optimization algorithm, that is
used to update the parameters of a model in a more efficient and faster way.
● “Stochastic” in plain terms means “random”
● In SGD, at each step, the algorithm calculates the gradient for one observation picked at random,
instead of calculating the gradient for the entire dataset..
● So, let’s have a dataset that contains 1000 rows, and when we apply SGD it will update the model
parameters 1000 times in one complete cycle of a dataset instead of one time as in Gradient Descent.
What is Stochastic Gradient Descent
● In the left diagram of the above picture, we have SGD (where 1 per step time) we take a Gradient
Descent step for each example and on the right diagram is GD(1 step per entire training set).
● This represents a significant performance improvement, when the dataset contains millions of
observations.
What is Stochastic Gradient Descent
Advantages of Stochastic Gradient Descent
● It is easier to fit into memory due to a single training sample being processed by the network
● For larger datasets it can converge faster as it causes updates to the parameters more frequently
● Due to frequent updates the steps taken towards the minima of the loss function have oscillations
which can help getting out of local minimums of the loss function
What is Stochastic Gradient Descent
● So far we encountered two extremes in the approach to gradient-based learning:
● First Gradient Descent uses the full dataset to compute gradients and to update parameters, one
pass at a time. And Conversely, Stochastic Gradient Descent processes one training example at a
time to make progress. Either of them has its own drawbacks.
● Gradient descent is not particularly data efficient whenever data is very similar. Stochastic gradient
descent is not particularly computationally efficient since CPUs and GPUs cannot exploit the full
power of vectorization.
● This suggests that there might be something in between, and in fact, that is what we have been using
so far in the examples we discussed.
What is Minibatch Stochastic Gradient
● Mini Batch Gradient Descent is considered to be the cross-over between GD and SGD. In this
approach instead of iterating through the entire dataset or one observation, we split the dataset into
small subsets (batches) and compute the gradients for each batch.
● Steps involved in Mini-batch stochastic gradient:
1. Pick a mini-batch
2. Feed it to Neural Network
3. Calculate the mean gradient of the mini-batch
4. Use the mean gradient we calculated in step 3 to update the weights
5. Repeat steps 1–4 for the mini-batches we created
What is Minibatch Stochastic Gradient
● Minibatch stochastic gradient descent is able to trade-off convergence speed and computation
efficiency. A minibatch size of 10 is more efficient than stochastic gradient descent; a minibatch size
of 100 even outperforms GD in terms of runtime.
What is Minibatch Stochastic Gradient
Advantages of Mini-Batch Gradient Descent:
● Reduces variance of the parameter update and hence lead to stable convergence
● Speeds the learning
● Helpful to estimate the approximate location of the actual minimum
Disadvantages of Mini Batch Gradient Descent:
● Loss is computed for each mini batch and hence total loss needs to be accumulated across all mini
batches
Advantages and Disadvantages
The Adam optimization algorithm is an extension to stochastic gradient descent that has recently
seen broader adoption for deep learning applications in computer vision and natural language
processing.
The method is really efficient when working with large problem involving a lot of data or parameters.
Adam is an adaptive learning rate method, which means, it computes individual learning rates for
different parameters. Its name is derived from adaptive moment estimation
What is Adam Optimizer
The method computes individual adaptive learning rates for different parameters from estimates of
first and second moments of the gradients.
Adam optimizer involves a combination of two gradient descent methodologies:
1. Momentum:
This algorithm is used to accelerate the gradient descent algorithm by taking into consideration
the ‘exponentially weighted average’ of the gradients. Using averages makes the algorithm
converge towards the minima in a faster pace.
2. Root Mean Square Propagation (RMSP):
It maintains per-parameter learning rates that are adapted based on the average of recent
magnitudes of the gradients for the weight (e.g. how quickly it is changing). This means the
algorithm does well on online and non-stationary problems (e.g. noisy).
How Adam Optimizer Work
List of attractive benefits of using Adam, as follows:
● Straightforward to implement.
● Computationally efficient.
● Less memory requirements.
● Well suited for problems that are large in terms of data and/or parameters.
● Appropriate for problems with very noisy/or sparse gradients.
● Hyper-parameters have intuitive interpretation and typically require little tuning.
Benefits of Adam Optimizer
Demo
Thank You !
Get in touch with us:
Lorem Studio, Lord Building
D4456, LA, USA

More Related Content

What's hot

An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 
Knowledge representation In Artificial Intelligence
Knowledge representation In Artificial IntelligenceKnowledge representation In Artificial Intelligence
Knowledge representation In Artificial IntelligenceRamla Sheikh
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Activation functions
Activation functionsActivation functions
Activation functionsPRATEEK SAHU
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
07 regularization
07 regularization07 regularization
07 regularizationRonald Teo
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection methodAmir Razmjou
 

What's hot (20)

Activation function
Activation functionActivation function
Activation function
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Knowledge representation In Artificial Intelligence
Knowledge representation In Artificial IntelligenceKnowledge representation In Artificial Intelligence
Knowledge representation In Artificial Intelligence
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Activation function
Activation functionActivation function
Activation function
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Machine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-offMachine Learning: Bias and Variance Trade-off
Machine Learning: Bias and Variance Trade-off
 
Activation functions
Activation functionsActivation functions
Activation functions
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
07 regularization
07 regularization07 regularization
07 regularization
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
 

Similar to Methods of Optimization in Machine Learning

4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptxkumarkaushal17
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningVenkata Karthik Gullapalli
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning conceptsJoe li
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfAaryanArora10
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
 
Everything You Wanted to Know About Optimization
Everything You Wanted to Know About OptimizationEverything You Wanted to Know About Optimization
Everything You Wanted to Know About Optimizationindico data
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdfjuan631
 
Linear programming models - U2.pptx
Linear programming models - U2.pptxLinear programming models - U2.pptx
Linear programming models - U2.pptxMariaBurgos55
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Wuhyun Rico Shin
 
Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)Arjun Varma
 

Similar to Methods of Optimization in Machine Learning (20)

4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx4. OPTIMIZATION NN AND FL.pptx
4. OPTIMIZATION NN AND FL.pptx
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Regresión
RegresiónRegresión
Regresión
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
 
Dnn guidelines
Dnn guidelinesDnn guidelines
Dnn guidelines
 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
Everything You Wanted to Know About Optimization
Everything You Wanted to Know About OptimizationEverything You Wanted to Know About Optimization
Everything You Wanted to Know About Optimization
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdf
 
Linear programming models - U2.pptx
Linear programming models - U2.pptxLinear programming models - U2.pptx
Linear programming models - U2.pptx
 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.
 
Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)Mining model for hotel recommendations (Kaggle Challenge)
Mining model for hotel recommendations (Kaggle Challenge)
 

More from Knoldus Inc.

GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfKnoldus Inc.
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxKnoldus Inc.
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingKnoldus Inc.
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesKnoldus Inc.
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxKnoldus Inc.
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxKnoldus Inc.
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationKnoldus Inc.
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIsKnoldus Inc.
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II PresentationKnoldus Inc.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAKnoldus Inc.
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Knoldus Inc.
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxKnoldus Inc.
 
The Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinThe Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinKnoldus Inc.
 
Data Engineering with Databricks Presentation
Data Engineering with Databricks PresentationData Engineering with Databricks Presentation
Data Engineering with Databricks PresentationKnoldus Inc.
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Knoldus Inc.
 

More from Knoldus Inc. (20)

GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptx
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptx
 
The Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinThe Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and Kotlin
 
Data Engineering with Databricks Presentation
Data Engineering with Databricks PresentationData Engineering with Databricks Presentation
Data Engineering with Databricks Presentation
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Methods of Optimization in Machine Learning

  • 1. Presented By: Aayush Srivastava & Divyank Saxena Methods of Optimization in Machine Learning
  • 2. Lack of etiquette and manners is a huge turn off. KnolX Etiquettes Punctuality Join the session 5 minutes prior to the session start time. We start on time and conclude on time! Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Silent Mode Keep your mobile devices in silent mode, feel free to move out of session in case you need to attend an urgent call. Avoid Disturbance Avoid unwanted chit chat during the session.
  • 3. Our Agenda 01 What is Optimization in Machine Learning 02 What is Gradient Descent 03 What is Minibatch Stochastic Gradient 04 What is Adam optimization 05 Demo 05 06 What is Stochastic Gradient Descent
  • 4. . What is Optimization in ML ● Optimization in Machine Learning is a technique used to find the best set of parameters for a given model to minimize a loss function and improve its performance. It is an essential step in the training process of a machine learning model. ● The goal of optimization is to find the best weights and biases for the model, so that it can make accurate predictions. ● Optimization is used in machine learning because models typically have many parameters, and finding the best values for those parameters can be a challenging task. ● With optimization techniques, the model can automatically search for the best parameters, rather than relying on manual tuning by the user.
  • 5. . What is Cost Function ● A cost function is a function which measures the error between predictions and their actual values across the whole dataset. ● Minimizing the cost function helps the learning algorithm find the optimal set of parameters, such as weights and biases, that produce the best predictions. ● Cost function is a measure of how wrong the model is in estimating the relationship between X(input) and Y(output) Parameter - m is the number of samples - Sum from i to m, - The actual calculation is just the hypothesis value for h(x) minus the actual value of y. Then you square whatever you get.
  • 6. . What is Cost Function ● Let’s run through the calculation for best_fit_1. 1.The hypothesis is 0.50. This is the h_the ha(x(i)) part what we think is the correct value. 2.The actual value for the sample data is 1.00. So we are left with (0.50 — 1.00)^2 , which is 0.25. 3.Let’s add this result to an array called results and do the same for all three points 4.Results = [0.25, 2.25, 4.00] 5.Finally, we add them all up and multiply by ⅙ .We get the cost for best_fit1 = 1.083
  • 7. . What is Cost Function ● COST: best_fit_1: 1.083 best_fit_2: 0.083 best_fit_3: 0.25 ● A low costs represents a smaller difference.
  • 8. . What is Loss Function ● A loss function, also known objective function, is a mathematical measure of how well a model is able to make predictions that match the true values. ● A loss function measures the error between a single prediction and the corresponding actual value. ● Loss and cost functions are methods of measuring the error in machine learning predictions. Loss functions measure the error per observation, whilst cost functions measure the error over all observations. Types: 1.Mean Squared Error (MSE): This loss function measures the average squared difference between the predicted values and the true values. 2.Mean Absolute Error (MAE): This loss function measures the average absolute difference between the predicted values and the true values.
  • 9. ● Gradient, in plain terms means slope or slant of a surface. So gradient descent literally means descending a slope to reach the lowest point on that surface ● Gradient descent enables a model to learn the gradient or direction that the model should take in order to reduce errors (differences between actual y and predicted y). ● This algorithm that tries to find a minimum of a function iteratively What is Gradient Descent
  • 10. . What is Learning Rate ● Learning Rate: The learning rate is a hyperparameter in machine learning that determines the step size at which the optimization algorithm updates the model's parameters. It is used to control the speed at which the model learns.
  • 11. . Limitation of Gradient Descent ● Some limitations and drawbacks that can affect its performance and efficiency. ● Local Minima: Gradient Descent can get stuck in a local minimum, which may not be the global minimum, and therefore, the optimization will not produce the best result. ● Vanishing gradient: When training deep neural networks, the gradients can become very small, leading to the vanishing gradient problem, which can slow down or prevent convergence.
  • 12. ● Stochastic Gradient Descent (SGD) is a variant of Gradient Descent optimization algorithm, that is used to update the parameters of a model in a more efficient and faster way. ● “Stochastic” in plain terms means “random” ● In SGD, at each step, the algorithm calculates the gradient for one observation picked at random, instead of calculating the gradient for the entire dataset.. ● So, let’s have a dataset that contains 1000 rows, and when we apply SGD it will update the model parameters 1000 times in one complete cycle of a dataset instead of one time as in Gradient Descent. What is Stochastic Gradient Descent
  • 13. ● In the left diagram of the above picture, we have SGD (where 1 per step time) we take a Gradient Descent step for each example and on the right diagram is GD(1 step per entire training set). ● This represents a significant performance improvement, when the dataset contains millions of observations. What is Stochastic Gradient Descent
  • 14. Advantages of Stochastic Gradient Descent ● It is easier to fit into memory due to a single training sample being processed by the network ● For larger datasets it can converge faster as it causes updates to the parameters more frequently ● Due to frequent updates the steps taken towards the minima of the loss function have oscillations which can help getting out of local minimums of the loss function What is Stochastic Gradient Descent
  • 15. ● So far we encountered two extremes in the approach to gradient-based learning: ● First Gradient Descent uses the full dataset to compute gradients and to update parameters, one pass at a time. And Conversely, Stochastic Gradient Descent processes one training example at a time to make progress. Either of them has its own drawbacks. ● Gradient descent is not particularly data efficient whenever data is very similar. Stochastic gradient descent is not particularly computationally efficient since CPUs and GPUs cannot exploit the full power of vectorization. ● This suggests that there might be something in between, and in fact, that is what we have been using so far in the examples we discussed. What is Minibatch Stochastic Gradient
  • 16. ● Mini Batch Gradient Descent is considered to be the cross-over between GD and SGD. In this approach instead of iterating through the entire dataset or one observation, we split the dataset into small subsets (batches) and compute the gradients for each batch. ● Steps involved in Mini-batch stochastic gradient: 1. Pick a mini-batch 2. Feed it to Neural Network 3. Calculate the mean gradient of the mini-batch 4. Use the mean gradient we calculated in step 3 to update the weights 5. Repeat steps 1–4 for the mini-batches we created What is Minibatch Stochastic Gradient
  • 17. ● Minibatch stochastic gradient descent is able to trade-off convergence speed and computation efficiency. A minibatch size of 10 is more efficient than stochastic gradient descent; a minibatch size of 100 even outperforms GD in terms of runtime. What is Minibatch Stochastic Gradient
  • 18. Advantages of Mini-Batch Gradient Descent: ● Reduces variance of the parameter update and hence lead to stable convergence ● Speeds the learning ● Helpful to estimate the approximate location of the actual minimum Disadvantages of Mini Batch Gradient Descent: ● Loss is computed for each mini batch and hence total loss needs to be accumulated across all mini batches Advantages and Disadvantages
  • 19. The Adam optimization algorithm is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing. The method is really efficient when working with large problem involving a lot of data or parameters. Adam is an adaptive learning rate method, which means, it computes individual learning rates for different parameters. Its name is derived from adaptive moment estimation What is Adam Optimizer
  • 20. The method computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients. Adam optimizer involves a combination of two gradient descent methodologies: 1. Momentum: This algorithm is used to accelerate the gradient descent algorithm by taking into consideration the ‘exponentially weighted average’ of the gradients. Using averages makes the algorithm converge towards the minima in a faster pace. 2. Root Mean Square Propagation (RMSP): It maintains per-parameter learning rates that are adapted based on the average of recent magnitudes of the gradients for the weight (e.g. how quickly it is changing). This means the algorithm does well on online and non-stationary problems (e.g. noisy). How Adam Optimizer Work
  • 21. List of attractive benefits of using Adam, as follows: ● Straightforward to implement. ● Computationally efficient. ● Less memory requirements. ● Well suited for problems that are large in terms of data and/or parameters. ● Appropriate for problems with very noisy/or sparse gradients. ● Hyper-parameters have intuitive interpretation and typically require little tuning. Benefits of Adam Optimizer
  • 22. Demo
  • 23. Thank You ! Get in touch with us: Lorem Studio, Lord Building D4456, LA, USA