SlideShare a Scribd company logo
1 of 70
Scientific Computing on JRuby
github.com/prasunanand
Objective
● A Scientific library is memory intensive and speed counts. How to use JRuby
effectively to create a great tool/gem?
● A General Purpose GPU library for Ruby that can be used by industry in
production and academia for research.
● Ruby Science Foundation
● SciRuby has been trying to push Ruby for scientific computing.
● Popular Rubygems:
1. NMatrix
2. Daru
3. Mixed_models
NMatrix
● NMatrix is SciRuby’s numerical matrix core, implementing dense matrices as
well as two types of sparse (linked-list-based and Yale/CSR).
● It currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for
several of its linear algebra operations.
Daru
Mixed_models
Nyaplot
SciRuby vs SciPy
● We love Ruby.
● We love Rails.
● Expressiveness of Ruby.
● Known for performance JRuby is 10 times faster than CRuby.
● With truffle it’s around 40 times faster than CRuby. Truffle is supported by
Oracle.
Say Hello!
NMatrix for JRuby
● Parallelism=> No Global Interpreter Lock as in case of MRI
● Easy Deployment(Warbler gem)
● Auto Garbage collection.
● Speed
● NMatrix for JRuby relies on Apache Commons Math
MDArray
● Not a unified interface for Sciruby gems=> Why not build a wrapper around
MDArray ?
● MDArray is a great gem for Linear Algebra.
● MdArray used Parallel colt that was depreceated.
● However, every gem that used NMatrix as dependency needed to be
reimplemented with MDArray.
How NMatrix works?
● N-Dimensional
● 2-Dimensional NMatrix
N-dimensional matrices are stored as a one-dimensional Array!
NMatrix Architecture
MRI JRuby
N - dimensional Matrix
Elementwise Operation
● [:add, :subtract, :sin, :gamma]
● Iterate through the elements.
● Access the element; do the operation, return it
Challenges
● Autoboxing and Multiple data type
● Minimise copying of data
Errors that can’t be reproduced :p
[ 0.11, 0.05, 0.34, 0.14 ]
+ [ 0. 21, 0.05, 0.14, 0.14 ]
= [ 0, 0, 0, 0]
([ 0. 11, 0.05, 0.34, 0.14 ] + 5)
+ ([ 0. 21, 0.05, 0.14, 0.14 ] + 5)
- 10
= [ 0.32, 0.1, 0.48, 0.28]
Autoboxing
● :float64 => double only
● Strict dtypes => creating data type in Java. Can’t Rely on Reflection
● @s = Array.new()
● @s = Java::double[rows*cols].new()
Autoboxing and Enumerators
def each_with_indices
nmatrix = create_dummy_nmatrix
stride = get_stride(self)
offset = 0
coords = Array.new(dim){ 0 }
shape_copy = Array.new(dim)
(0...size).each do |k|
dense_storage_coords(nmatrix, k, coords,
stride, offset)
slice_index =
dense_storage_pos(coords,stride)
ary = Array.new
if (@dtype == :object)
ary << self.s[slice_index]
else
ary << self.s.toArray.to_a[slice_index]
end
(0...dim).each do |p|
ary << coords[p]
end
yield(ary)
end if block_given?
return nmatrix
end
Minimise copying of data
● Make sure you don’t make copies of data.
● Pass-by-Reference in action:
○ Use static methods as helpers.
2 - dimensional Matrix
2 - dimensional Matrix Operations
● [:dot, :det, :factorize_lu]
● In NMatrix-MRI, BLAS-III and LAPACK routines are implemented using their
respective libraries.
● NMatrix-JRuby depends on Java functions.
Challenges
● Converting a 1-D array to 2-D array
● Array Size and Accessing elements
● Speed and Memory Required
Ruby Code
index =0
puts Benchmark.measure{
(0...15000).each do |i|
(0...15000).each do |j|
c[i][j] = b[i][j]
index+=1
end
end
}
#67.790000 0.070000 67.860000 ( 65.126546)
#RAM consumed => 5.4GB
b = Java::double[15_000,15_000].new
c = Java::double[15_000,15_000].new
index=0
puts Benchmark.measure{
(0...15000).each do |i|
(0...15000).each do |j|
b[i][j] = index
index+=1
end
end
}
#43.260000 3.250000 46.510000 ( 39.606356)
Java Code
public class MatrixGenerator{
public static void test2(){
for (int index=0, i=0; i < row ; i++){
for (int j=0; j < col; j++){
c[i][j]= b[i][j];
index++;
}
}
}
puts Benchmark.measure{MatrixGenerator.test2}
#0.034000 0.001000 00.034000 ( 00.03300)
#RAM consumed => 300MB
public class MatrixGenerator{
public static void test1(){
double[][] b = new double[15000][15000];
double[][] c = new double[15000][15000];
for (int index=0, i=0; i < row ; i++){
for (int j=0; j < col; j++){
b[i][j]= index;
index++;
}
}
}
puts Benchmark.measure{MatrixGenerator.test1}
#0.032000 0.001000 00.032000 ( 00.03100)
Results
Improves:
● 1000 times the speed
● 10times the memory
Mixed models
● After NMAtrix for doubles was ready, I tested it with mixed_models.
Benchmarking NMatrix functionalities
System Specifications
● CPU: AMD FX8350 0ctacore 4.2GHz
● RAM: 16GB
Addition
Subtraction
Gamma
Matrix Multiplication
Determinant
Factorization
Benchmark conclusion
● NMatrix-JRuby is incredibly faster for N-dimensional matrices when
elementwise operations are concerned.
● NMatrix-MRI is faster for 2-dimensional matrix when calculating matrix
multiplication, determinant calculation and factorization.
Improvements
● Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and
LAPACK routines.
● How?
● Why not JBlas?
MRI
JRuby
Future Work
● Add support for complex dtype.
● Convert NMatrix-JRuby Enumerators to Java code.
● Add sparse support.
Am I done?
Nope!
Enter GPU
A General-Purpose GPU library
● Combine the beauty of Ruby with transparent GPU processing
● This will work both on client computers and on servers that make use of
TESLA's and Intel Xeon Phi solutions.
● Developer activity and support for the current projects is mixed at best, and
they are tough to use as they involve writing kernels and require a lot of effort
to be put in buffer/RAM optimisation.
ArrayFire-rb
● Wraps ArrayFire library
ArrayFire
● ArrayFire is an open-source GPGPU library written in C++ and uses JIT.
● ArrayFire supports CUDA-capable NVIDIA GPUs, OpenCL devices, and a C-
programming backend.
● It abstracts away from the difficult task of writing kernels for multiple
architectures; handling memory management, and performing tuning and
optimisation.
Using ArrayFire
MRI
● C extension
● Architecture is inspired by NMatrix and NArray
● The C++ function is placed in a namespace (e.g., namespace af { }) or is
declared static if possible. The C function receives the prefix af_, e.g.,
arf_multiply() (this function also happens to be static).
● C macros are capitalized and generally have the prefix ARF_, as with
ARF_DTYPE().
● C functions (and macros, for consistency) are placed within extern "C" { }
blocks to turn off C++ mangling.
● C macros (in extern blocks) may represent C++ constants (which are always
#include <ruby.h>
typedef struct AF_STRUCT
{
size_t ndims;
size_t count;
size_t* dimension;
double* array;
}afstruct;
void Init_arrayfire() {
ArrayFire = rb_define_module("ArrayFire");
Blas = rb_define_class_under(ArrayFire, "BLAS",
rb_cObject);
rb_define_singleton_method(Blas, "matmul",
(METHOD)arf_matmul, 2);
}
static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE
right_val){
afstruct* left;
afstruct* right;
afstruct* result = ALLOC(afstruct);
Data_Get_Struct(left_val, afstruct, left);
Data_Get_Struct(right_val, afstruct, right);
result->ndims = left->ndims;
size_t dimension[2];
dimension[0] = left->dimension[0];
dimension[1] = right->dimension[1];
size_t count = dimension[0]*dimension[1];
result->dimension = dimension;
result->count = count;
arf::matmul(result, left, right);
return Data_Wrap_Struct(CLASS_OF(left_val), NULL,
arf_free, result);
}
#include <arrayfire.h>
namespace arf {
using namespace af;
static void matmul(afstruct *result, afstruct *left, afstruct *right)
{
array l = array(left->dimension[0], left->dimension[1], left->array);
array r = array(right->dimension[0], right->dimension[1], right->array);
array res = matmul(l,r);
result->array = res.host<double>();
}
}
extern "C" {
#include "arrayfire.c"
}
JRuby
● The approach is same as NMatrix JRuby.
● Java Native Interface( JNI )
● Work on ArrayFire-Java.
● Place 'libaf.so' in the Load path.
require 'ext/vendor/ArrayFire.jar'
class Af_Array
attr_accessor :dims, :elements
def matmul(other)
Blas.matmul(self.arr, other)
end
end
Benchmarking ArrayFire
System Specification
CPU: AMD FX Octacore 4.2GHz
RAM: 16GB
GPU: Nvidia GTX 750Ti
GPU RAM : 4GB DDR5
Matrix Addition
Matrix Multiplication
Matrix Determinant
Factorization
Transparency
● Integrate with Narray
● Integrate with NMatrix
● Integrate with Rails
Applications
● Endless possibilities ;)
● Bioinformatics
● Integrate Tensorflow
● Image Processing
● Computational Fluid Dynamics
Conclusion
Useful Links
● https://github.com/sciruby/nmatrix
● https://github.com/arrayfire/arrayfire-rb
● https://github.com/prasunanand/arrayfire-rb/tree/temp
Acknowlegements
1. Pjotr Prins
2. Charles Nutter
3. John Woods
4. Alexej Gossmann
5. Sameer Deshmukh
6. Pradeep Garigipati
Thank You
Github: prasunanand
Twitter: @prasun_anand
Blog: prasunanand.com

More Related Content

What's hot

MapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelMapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelJavaDayUA
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David SzakallasDatabricks
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
 
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...Edureka!
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 
Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerSeiya Tokui
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersBayu Aldi Yansyah
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindingsDmitriy Lyubimov
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)RichardWarburton
 
What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0Makoto Yui
 
Bringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to MahoutBringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to Mahoutsscdotopen
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探台灣資料科學年會
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big DataLeonardo Gamas
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...Edge AI and Vision Alliance
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Mark Smith
 
What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0Makoto Yui
 

What's hot (20)

MapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelMapDB - taking Java collections to the next level
MapDB - taking Java collections to the next level
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David Szakallas
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
TensorFlow Object Detection | Realtime Object Detection with TensorFlow | Ten...
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015Sperasoft‬ talks j point 2015
Sperasoft‬ talks j point 2015
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning Practitioners
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
 
Caching in
Caching inCaching in
Caching in
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0What's new in Apache Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0
 
Bringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to MahoutBringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to Mahout
 
Map db
Map dbMap db
Map db
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big Data
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
 
What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0What's new in Hivemall v0.5.0
What's new in Hivemall v0.5.0
 

Similar to Fosdem2017 Scientific computing on Jruby

Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for RubyRubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for RubyPrasun Anand
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
High performance GPU computing with Ruby RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby RubyConf 2017Prasun Anand
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Francesco
 
Getting Functional with Scala
Getting Functional with ScalaGetting Functional with Scala
Getting Functional with ScalaJorge Paez
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen TatarynovFwdays
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 

Similar to Fosdem2017 Scientific computing on Jruby (20)

Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for RubyRubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
High performance GPU computing with Ruby RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby RubyConf 2017
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013
 
Java 8
Java 8Java 8
Java 8
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Getting Functional with Scala
Getting Functional with ScalaGetting Functional with Scala
Getting Functional with Scala
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Oct.22nd.Presentation.Final
Oct.22nd.Presentation.FinalOct.22nd.Presentation.Final
Oct.22nd.Presentation.Final
 
Lrz kurse: r as superglue
Lrz kurse: r as superglueLrz kurse: r as superglue
Lrz kurse: r as superglue
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Xgboost
XgboostXgboost
Xgboost
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Fosdem2017 Scientific computing on Jruby

  • 1. Scientific Computing on JRuby github.com/prasunanand
  • 2. Objective ● A Scientific library is memory intensive and speed counts. How to use JRuby effectively to create a great tool/gem? ● A General Purpose GPU library for Ruby that can be used by industry in production and academia for research.
  • 3. ● Ruby Science Foundation ● SciRuby has been trying to push Ruby for scientific computing. ● Popular Rubygems: 1. NMatrix 2. Daru 3. Mixed_models
  • 4. NMatrix ● NMatrix is SciRuby’s numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). ● It currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for several of its linear algebra operations.
  • 5.
  • 9. SciRuby vs SciPy ● We love Ruby. ● We love Rails. ● Expressiveness of Ruby.
  • 10. ● Known for performance JRuby is 10 times faster than CRuby. ● With truffle it’s around 40 times faster than CRuby. Truffle is supported by Oracle.
  • 12. NMatrix for JRuby ● Parallelism=> No Global Interpreter Lock as in case of MRI ● Easy Deployment(Warbler gem) ● Auto Garbage collection. ● Speed ● NMatrix for JRuby relies on Apache Commons Math
  • 13. MDArray ● Not a unified interface for Sciruby gems=> Why not build a wrapper around MDArray ? ● MDArray is a great gem for Linear Algebra. ● MdArray used Parallel colt that was depreceated. ● However, every gem that used NMatrix as dependency needed to be reimplemented with MDArray.
  • 14. How NMatrix works? ● N-Dimensional ● 2-Dimensional NMatrix
  • 15. N-dimensional matrices are stored as a one-dimensional Array!
  • 17. N - dimensional Matrix
  • 18. Elementwise Operation ● [:add, :subtract, :sin, :gamma] ● Iterate through the elements. ● Access the element; do the operation, return it
  • 19.
  • 20. Challenges ● Autoboxing and Multiple data type ● Minimise copying of data
  • 21. Errors that can’t be reproduced :p [ 0.11, 0.05, 0.34, 0.14 ] + [ 0. 21, 0.05, 0.14, 0.14 ] = [ 0, 0, 0, 0] ([ 0. 11, 0.05, 0.34, 0.14 ] + 5) + ([ 0. 21, 0.05, 0.14, 0.14 ] + 5) - 10 = [ 0.32, 0.1, 0.48, 0.28]
  • 22. Autoboxing ● :float64 => double only ● Strict dtypes => creating data type in Java. Can’t Rely on Reflection ● @s = Array.new() ● @s = Java::double[rows*cols].new()
  • 23. Autoboxing and Enumerators def each_with_indices nmatrix = create_dummy_nmatrix stride = get_stride(self) offset = 0 coords = Array.new(dim){ 0 } shape_copy = Array.new(dim) (0...size).each do |k| dense_storage_coords(nmatrix, k, coords, stride, offset) slice_index = dense_storage_pos(coords,stride) ary = Array.new if (@dtype == :object) ary << self.s[slice_index] else ary << self.s.toArray.to_a[slice_index] end (0...dim).each do |p| ary << coords[p] end yield(ary) end if block_given? return nmatrix end
  • 24. Minimise copying of data ● Make sure you don’t make copies of data. ● Pass-by-Reference in action: ○ Use static methods as helpers.
  • 25. 2 - dimensional Matrix
  • 26. 2 - dimensional Matrix Operations ● [:dot, :det, :factorize_lu] ● In NMatrix-MRI, BLAS-III and LAPACK routines are implemented using their respective libraries. ● NMatrix-JRuby depends on Java functions.
  • 27. Challenges ● Converting a 1-D array to 2-D array ● Array Size and Accessing elements ● Speed and Memory Required
  • 28.
  • 29. Ruby Code index =0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| c[i][j] = b[i][j] index+=1 end end } #67.790000 0.070000 67.860000 ( 65.126546) #RAM consumed => 5.4GB b = Java::double[15_000,15_000].new c = Java::double[15_000,15_000].new index=0 puts Benchmark.measure{ (0...15000).each do |i| (0...15000).each do |j| b[i][j] = index index+=1 end end } #43.260000 3.250000 46.510000 ( 39.606356)
  • 30.
  • 31. Java Code public class MatrixGenerator{ public static void test2(){ for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ c[i][j]= b[i][j]; index++; } } } puts Benchmark.measure{MatrixGenerator.test2} #0.034000 0.001000 00.034000 ( 00.03300) #RAM consumed => 300MB public class MatrixGenerator{ public static void test1(){ double[][] b = new double[15000][15000]; double[][] c = new double[15000][15000]; for (int index=0, i=0; i < row ; i++){ for (int j=0; j < col; j++){ b[i][j]= index; index++; } } } puts Benchmark.measure{MatrixGenerator.test1} #0.032000 0.001000 00.032000 ( 00.03100)
  • 32. Results Improves: ● 1000 times the speed ● 10times the memory
  • 33. Mixed models ● After NMAtrix for doubles was ready, I tested it with mixed_models.
  • 35. System Specifications ● CPU: AMD FX8350 0ctacore 4.2GHz ● RAM: 16GB
  • 38. Gamma
  • 42. Benchmark conclusion ● NMatrix-JRuby is incredibly faster for N-dimensional matrices when elementwise operations are concerned. ● NMatrix-MRI is faster for 2-dimensional matrix when calculating matrix multiplication, determinant calculation and factorization.
  • 43. Improvements ● Make NMatrix-JRuby faster than NMatrix-MRI using BLAS level-3 and LAPACK routines. ● How? ● Why not JBlas?
  • 45. Future Work ● Add support for complex dtype. ● Convert NMatrix-JRuby Enumerators to Java code. ● Add sparse support.
  • 47. Nope!
  • 49. A General-Purpose GPU library ● Combine the beauty of Ruby with transparent GPU processing ● This will work both on client computers and on servers that make use of TESLA's and Intel Xeon Phi solutions. ● Developer activity and support for the current projects is mixed at best, and they are tough to use as they involve writing kernels and require a lot of effort to be put in buffer/RAM optimisation.
  • 51. ArrayFire ● ArrayFire is an open-source GPGPU library written in C++ and uses JIT. ● ArrayFire supports CUDA-capable NVIDIA GPUs, OpenCL devices, and a C- programming backend. ● It abstracts away from the difficult task of writing kernels for multiple architectures; handling memory management, and performing tuning and optimisation.
  • 53. MRI ● C extension ● Architecture is inspired by NMatrix and NArray ● The C++ function is placed in a namespace (e.g., namespace af { }) or is declared static if possible. The C function receives the prefix af_, e.g., arf_multiply() (this function also happens to be static). ● C macros are capitalized and generally have the prefix ARF_, as with ARF_DTYPE(). ● C functions (and macros, for consistency) are placed within extern "C" { } blocks to turn off C++ mangling. ● C macros (in extern blocks) may represent C++ constants (which are always
  • 54. #include <ruby.h> typedef struct AF_STRUCT { size_t ndims; size_t count; size_t* dimension; double* array; }afstruct; void Init_arrayfire() { ArrayFire = rb_define_module("ArrayFire"); Blas = rb_define_class_under(ArrayFire, "BLAS", rb_cObject); rb_define_singleton_method(Blas, "matmul", (METHOD)arf_matmul, 2); } static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val){ afstruct* left; afstruct* right; afstruct* result = ALLOC(afstruct); Data_Get_Struct(left_val, afstruct, left); Data_Get_Struct(right_val, afstruct, right); result->ndims = left->ndims; size_t dimension[2]; dimension[0] = left->dimension[0]; dimension[1] = right->dimension[1]; size_t count = dimension[0]*dimension[1]; result->dimension = dimension; result->count = count; arf::matmul(result, left, right); return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result); }
  • 55. #include <arrayfire.h> namespace arf { using namespace af; static void matmul(afstruct *result, afstruct *left, afstruct *right) { array l = array(left->dimension[0], left->dimension[1], left->array); array r = array(right->dimension[0], right->dimension[1], right->array); array res = matmul(l,r); result->array = res.host<double>(); } } extern "C" { #include "arrayfire.c" }
  • 56. JRuby ● The approach is same as NMatrix JRuby. ● Java Native Interface( JNI ) ● Work on ArrayFire-Java.
  • 57. ● Place 'libaf.so' in the Load path. require 'ext/vendor/ArrayFire.jar' class Af_Array attr_accessor :dims, :elements def matmul(other) Blas.matmul(self.arr, other) end end
  • 59. System Specification CPU: AMD FX Octacore 4.2GHz RAM: 16GB GPU: Nvidia GTX 750Ti GPU RAM : 4GB DDR5
  • 64. Transparency ● Integrate with Narray ● Integrate with NMatrix ● Integrate with Rails
  • 65. Applications ● Endless possibilities ;) ● Bioinformatics ● Integrate Tensorflow ● Image Processing ● Computational Fluid Dynamics
  • 67. Useful Links ● https://github.com/sciruby/nmatrix ● https://github.com/arrayfire/arrayfire-rb ● https://github.com/prasunanand/arrayfire-rb/tree/temp
  • 68. Acknowlegements 1. Pjotr Prins 2. Charles Nutter 3. John Woods 4. Alexej Gossmann 5. Sameer Deshmukh 6. Pradeep Garigipati
  • 69.
  • 70. Thank You Github: prasunanand Twitter: @prasun_anand Blog: prasunanand.com