SlideShare a Scribd company logo
1 of 12
Download to read offline
Survey on High Productivity Computing Systems
(HPCS) Languages
SALIYA EKANAYAKE
3/11/2013 PART OF QUALIFIER PRESENTATION 1
School of Informatics and Computing
Indiana University
Outline
Parallel Programs
Parallel Programming Memory Models
Idioms of Parallel Computing
◦ Data Parallel Computation
◦ Data Distribution
◦ Asynchronous Remote Tasks
◦ Nested Parallelism
◦ Remote Transactions
3/11/2013 PART OF QUALIFIER PRESENTATION 2
Parallel Programs
Steps in Creating a Parallel Program
3/11/2013 PART OF QUALIFIER PRESENTATION 3
…
…
…
…
…
…
ACU 0
ACU 2
ACU 1
ACU 3
ACU 0
ACU 2
ACU 1
ACU 3
PCU 0
PCU 2
PCU 1
PCU 3
Sequential
Computation
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Tasks
Abstract
Computing
Units (ACU)
e.g. processes
Parallel
Program
Physical
Computing
Units (PCU)
e.g. processor, core
Decomposition Assignment Orchestration Mapping
Constructs to Create ACUs
◦ Explicit
◦ Java threads, Parallel.Foreach in TPL
◦ Implicit
◦ for loops, also do blocks in Fortress
◦ Compiler Directives
◦ #pragma omp parallel for in
OpenMP
Parallel Programming Memory Models
3/11/2013 PART OF QUALIFIER PRESENTATION 4
Task
Shared Global Address Space
...Task Task Task
CPU
Network
Processor
Memory
Processor
CPU CPU
Memory
Processor
CPU CPU
Memory
...
Shared Global Address Space
Task
CPU
Task
Task
Task
Local Address
Space
Task Task Task
Local Address
Space
Local Address
Space
Local Address
Space
...
CPU
Network
Processor
Memory
Processor
CPU CPU
Memory
Processor
CPU CPU
Memory
...Task
CPU
Task
Task
Local
Address
Space
Local Address
Space
Task
Shared Global
Address Space
... Task Task
Shared Global
Address Space
... Task Task
Shared Global
Address Space
... Task
...
Local Address
Space
Local Address
Space
Task Task Task
Task
...
Task Task
Partitioned Shared Address Space
Local Address
Space
Local Address
Space
Local Address
Space
X XX Y
Z
Array [ ]
Task 1 Task 2 Task 3
Local Address Spaces
Partitioned Shared Address Space
Each task has declared a private variable X
Task 1 has declared another private variable Y
Task 3 has declared a shared variable Z
An array is declared as shared across the shared address space
Every task can access variable Z
Every task can access each element of the array
Only Task 1 can access variable Y
Each copy of X is local to the task declaring it and may not necessarily contain the
same value
Access of elements local to a task in the array is faster than accessing other
elements.
Task 3 may access Z faster than Task 1 and Task 2
Shared
Distributed
PartitionedGlobalAddressSpace
Hybrid
SharedMemory
Implementation
DistributedMemory
Implementation
Idioms of Parallel Computing
Common Task
Language
Chapel X10 Fortress
Data parallel computation forall finish … for … async for
Data distribution dmapped DistArray arrays, vectors, matrices
Asynchronous Remote Tasks on … begin at … async spawn … at
Nested parallelism cobegin … forall for … async for … spawn
Remote transactions
on … atomic
(not implemented yet)
at … atomic at … atomic
3/11/2013 PART OF QUALIFIER PRESENTATION 5
Data Parallel Computation
3/11/2013 PART OF QUALIFIER PRESENTATION 6
forall (a,b,c) in zip (A,B,C) do
a = b + alpha * c;
forall i in 1 … N do
a(i) = b(i);
[i in 1 … N] a(i) = b(i);
A = B + alpha * C;
writeln(+ reduce [i in 1 .. 10] i**2;)
for (p in A)
A(p) = 2 * A(p);
for ([i] in 1 .. N)
sum += i;
finish for (p in A)
async A(p) = 2 * A(p);
for i <- 1:10 do
A[i] := i end
A:ZZ32[3,3]=[1 2 3;4 5 6;7 8 9]
for (i,j) <- A.indices() do
A[i,j] := i end
for a <- A do
println(a) end
for a <- {[ZZ32] 1,3,5,7,9} do
println(a) end
end
for i <- sequential(1:10) do
A[i] := i end
for a <- sequential({[ZZ32] 1,3,10,8,6}) do
println(a) end
end
Chapel X10 Fortress
Zipper
Arithmetic
domain
Short
Forms
StatementContextExpressionContext
SequentialParallel
Array
Number
Range
ParallelSequential
Array
Indices
Array
Elements
Number
Range
Set
Data Distribution
3/11/2013 PART OF QUALIFIER PRESENTATION 7
Chapel X10 Fortress
Domain and Array
var D: domain(2) = [1 .. m, 1 .. n];
var A: [D] real;
const D = [1..n, 1..n];
const BD = D dmapped Block(boundingBox=D);
var BA: [BD] real;
Box Distribution of Domain
val R = (0..5) * (1..3);
val arr = new Array[Int](R,10);
Region and Array
val blk = Dist.makeBlock((1..9)*(1..9));
val data : DistArray[Int]= DistArray.make[Int](blk, ([i,j]:Point(2)) => i*j);
Box Distribution of Array
Intended
◦ blocked
◦ blockCyclic
◦ columnMajor
◦ rowMajor
◦ Default
No Working Implementation
Asynchronous Remote Tasks
3/11/2013 PART OF QUALIFIER PRESENTATION 8
Chapel X10 Fortress
Asynchronous
Remote and Asynchronous
• at (p) async S
migrates the computation to p and spawns a new activity in p to
evaluate S and returns control
• async at (p) S
spawns a new activity in current place and returns control while the
spawned activity migrates the computation to p and evaluates S
there
• async at (p) async S
spawns a new activity in current place and returns control while the
spawned activity migrates the computation to p and spawns another
activity in p to evaluate S there
begin writeline(“Hello”);
writeline(“Hi”);
on A[i] do begin
A[i] = 2 * A[i]
writeline(“Hello”);
writeline(“Hi”);
{ // activity T
async {S1;} // spawns T1
async {S2;} // spawns T2
}
Asynchronous
Remote and Asynchronous
(v,w) := (exp1,
at a.region(i) do exp2 end)
spawn at a.region(i) do exp end
do
v := exp1
at a.region(i) do
w := exp2
end
x := v+w
end
Remote and Asynchronous
Implicit Multiple Threads and
Region Shift
Implicit Thread Group and Region
Shift
Nested Parallelism
3/11/2013 PART OF QUALIFIER PRESENTATION 9
Chapel X10 Fortress
Data Parallelism Inside Task
Parallelism
cobegin {
forall (a,b,c) in (A,B,C) do
a = b + alpha * c;
forall (d,e,f) in (D,E,F) do
d = e + beta * f;
}
sync forall (a) in (A) do
if (a % 5 ==0) then
begin f(a);
else
a = g(a);
Task Parallelism Inside Data
Parallelism
finish { async S1; async S2; }
Data Parallelism Inside Task
Parallelism
Given a data parallel code in X10 it is possible to
spawn new activities inside the body that gets
evaluated in parallel. However, in the absence of
a built-in data parallel construct, a scenario that
requires such nesting may be custom
implemented with constructs like finish, for,
and async instead of first having to make data
parallel code and embedding task parallelism
Note on Task Parallelism Inside Data
Parallelism
T:Thread[Any] = spawn do exp end
T.wait()
do exp1 also do exp2 end
Explicit Thread
Structural
Construct
Data Parallelism Inside Task
Parallelism
arr:Array[ZZ32,ZZ32]=array[ZZ32](4).fill(id)
for i <- arr.indices() do
t = spawn do arr[i]:= factorial(i) end
t.wait()
end
Note on Task Parallelism Inside Data
Parallelism
Remote Transactions
3/11/2013 PART OF QUALIFIER PRESENTATION 10
X10 Fortress
def pop() : T {
var ret : T;
when(size>0) {
ret = list.removeAt(0);
size --;
}
return ret;
}
var n : Int = 0;
finish {
async atomic n = n + 1; //(a)
async atomic n = n + 2; //(b)
}
var n : Int = 0;
finish {
async n = n + 1; //(a) -- BAD
async atomic n = n + 2; //(b)
}
Unconditional Local
Conditional Local
val blk = Dist.makeBlock((1..1)*(1..1),0);
val data = DistArray.make[Int](blk, ([i,j]:Point(2)) => 0);
val pt : Point = [1,1];
finish for (pl in Place.places()) {
async{
val dataloc = blk(pt);
if (dataloc != pl){
Console.OUT.println("Point " + pt + " is in place " + dataloc);
at (dataloc) atomic {
data(pt) = data(pt) + 1;
}
}
else {
Console.OUT.println("Point " + pt + " is in place " + pl);
atomic data(pt) = data(pt) + 2;
}
}
}
Console.OUT.println("Final value of point " + pt + " is " + data(pt));
Unconditional Remote
The atomicity is weak in the sense that an atomic block appears
atomic only to other atomic blocks running at the same place. Atomic
code running at remote places or non-atomic code running at local or
remote places may interfere with local atomic code, if care is not
taken
do
x:Z32 := 0
y:Z32 := 0
z:Z32 := 0
atomic do
x += 1
y += 1
also atomic do
z := x + y
end
z
end
Local
f(y:ZZ32):ZZ32=y y
D:Array[ZZ32,ZZ32]=array[ZZ32](4).fill(f)
q:ZZ32=0
at D.region(2) atomic do
println("at D.region(2)")
q:=D[2]
println("q in first atomic: " q)
also at D.region(1) atomic do
println("at D.region(1)")
q+=1
println("q in second atomic: " q)
end
println("Final q: " q)
Remote (true if distributions were
implemented)
K-Means Implementation
Why K-Means?
◦ Simple to Comprehend
◦ Broad Enough to Exploit Most of the Idioms
Distributed Parallel Implementations
◦ Chapel and X10
Parallel Non Distributed Implementation
◦ Fortress
Complete Working Code in Appendix of Paper
3/11/2013 PART OF QUALIFIER PRESENTATION 11
3/11/2013 PART OF QUALIFIER PRESENTATION 12
Thank you!

More Related Content

What's hot

Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeksBeginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeksJinTaek Seo
 
Dynamic Memory allocation
Dynamic Memory allocationDynamic Memory allocation
Dynamic Memory allocationGrishma Rajput
 
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...zaidinvisible
 
Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())Sameer Rathoud
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 
Computer Science Programming Assignment Help
Computer Science Programming Assignment HelpComputer Science Programming Assignment Help
Computer Science Programming Assignment HelpProgramming Homework Help
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsHolger Gruen
 
4 dynamic memory allocation
4 dynamic memory allocation4 dynamic memory allocation
4 dynamic memory allocationFrijo Francis
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Hpx runtime system
Hpx runtime systemHpx runtime system
Hpx runtime systemCOMAQA.BY
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 

What's hot (20)

Skyline queries
Skyline queriesSkyline queries
Skyline queries
 
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeksBeginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
Beginning direct3d gameprogramming10_shaderdetail_20160506_jintaeks
 
Dma
DmaDma
Dma
 
Dynamic memory allocation
Dynamic memory allocationDynamic memory allocation
Dynamic memory allocation
 
Dynamic Memory allocation
Dynamic Memory allocationDynamic Memory allocation
Dynamic Memory allocation
 
Operating System Engineering
Operating System EngineeringOperating System Engineering
Operating System Engineering
 
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
 
Lec09 nbody-optimization
Lec09 nbody-optimizationLec09 nbody-optimization
Lec09 nbody-optimization
 
Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())Memory Management C++ (Peeling operator new() and delete())
Memory Management C++ (Peeling operator new() and delete())
 
Computational Assignment Help
Computational Assignment HelpComputational Assignment Help
Computational Assignment Help
 
Linked list
Linked listLinked list
Linked list
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
Computer Science Programming Assignment Help
Computer Science Programming Assignment HelpComputer Science Programming Assignment Help
Computer Science Programming Assignment Help
 
Oit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked ListsOit And Indirect Illumination Using Dx11 Linked Lists
Oit And Indirect Illumination Using Dx11 Linked Lists
 
4 dynamic memory allocation
4 dynamic memory allocation4 dynamic memory allocation
4 dynamic memory allocation
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Hpx runtime system
Hpx runtime systemHpx runtime system
Hpx runtime system
 
Memory Management In C++
Memory Management In C++Memory Management In C++
Memory Management In C++
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 

Similar to Survey onhpcs languages

NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondPatrick Diehl
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming JobsDatabricks
 
OmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPOmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPIntel IT Center
 
Asynchronous programming with java script and node.js
Asynchronous programming with java script and node.jsAsynchronous programming with java script and node.js
Asynchronous programming with java script and node.jsTimur Shemsedinov
 
Programming the cloud with Skywriting
Programming the cloud with SkywritingProgramming the cloud with Skywriting
Programming the cloud with SkywritingDerek Murray
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
2 Years of Real World FP at REA
2 Years of Real World FP at REA2 Years of Real World FP at REA
2 Years of Real World FP at REAkenbot
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingDatabricks
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойSigma Software
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Databricks
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsJohn De Goes
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph ProcessingVasia Kalavri
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Data Con LA
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 
Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)David de Boer
 
Celery - A Distributed Task Queue
Celery - A Distributed Task QueueCelery - A Distributed Task Queue
Celery - A Distributed Task QueueDuy Do
 

Similar to Survey onhpcs languages (20)

NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Subtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff HammondSubtle Asynchrony by Jeff Hammond
Subtle Asynchrony by Jeff Hammond
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
OmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPOmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMP
 
Asynchronous programming with java script and node.js
Asynchronous programming with java script and node.jsAsynchronous programming with java script and node.js
Asynchronous programming with java script and node.js
 
Programming the cloud with Skywriting
Programming the cloud with SkywritingProgramming the cloud with Skywriting
Programming the cloud with Skywriting
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
2 Years of Real World FP at REA
2 Years of Real World FP at REA2 Years of Real World FP at REA
2 Years of Real World FP at REA
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and Streaming
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай Мозговой
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)
 
Cpp tutorial
Cpp tutorialCpp tutorial
Cpp tutorial
 
Celery - A Distributed Task Queue
Celery - A Distributed Task QueueCelery - A Distributed Task Queue
Celery - A Distributed Task Queue
 

More from Saliya Ekanayake

The Art and Craft of Woodworking
The Art and Craft of WoodworkingThe Art and Craft of Woodworking
The Art and Craft of WoodworkingSaliya Ekanayake
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Saliya Ekanayake
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingSaliya Ekanayake
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersSaliya Ekanayake
 

More from Saliya Ekanayake (6)

The Art and Craft of Woodworking
The Art and Craft of WoodworkingThe Art and Craft of Woodworking
The Art and Craft of Woodworking
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and Benchmarking
 
Sandhi Wimarshana
Sandhi WimarshanaSandhi Wimarshana
Sandhi Wimarshana
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
 
MapReduce in Simple Terms
MapReduce in Simple TermsMapReduce in Simple Terms
MapReduce in Simple Terms
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

Survey onhpcs languages

  • 1. Survey on High Productivity Computing Systems (HPCS) Languages SALIYA EKANAYAKE 3/11/2013 PART OF QUALIFIER PRESENTATION 1 School of Informatics and Computing Indiana University
  • 2. Outline Parallel Programs Parallel Programming Memory Models Idioms of Parallel Computing ◦ Data Parallel Computation ◦ Data Distribution ◦ Asynchronous Remote Tasks ◦ Nested Parallelism ◦ Remote Transactions 3/11/2013 PART OF QUALIFIER PRESENTATION 2
  • 3. Parallel Programs Steps in Creating a Parallel Program 3/11/2013 PART OF QUALIFIER PRESENTATION 3 … … … … … … ACU 0 ACU 2 ACU 1 ACU 3 ACU 0 ACU 2 ACU 1 ACU 3 PCU 0 PCU 2 PCU 1 PCU 3 Sequential Computation … … … … … … … … … … … … … … … … Tasks Abstract Computing Units (ACU) e.g. processes Parallel Program Physical Computing Units (PCU) e.g. processor, core Decomposition Assignment Orchestration Mapping Constructs to Create ACUs ◦ Explicit ◦ Java threads, Parallel.Foreach in TPL ◦ Implicit ◦ for loops, also do blocks in Fortress ◦ Compiler Directives ◦ #pragma omp parallel for in OpenMP
  • 4. Parallel Programming Memory Models 3/11/2013 PART OF QUALIFIER PRESENTATION 4 Task Shared Global Address Space ...Task Task Task CPU Network Processor Memory Processor CPU CPU Memory Processor CPU CPU Memory ... Shared Global Address Space Task CPU Task Task Task Local Address Space Task Task Task Local Address Space Local Address Space Local Address Space ... CPU Network Processor Memory Processor CPU CPU Memory Processor CPU CPU Memory ...Task CPU Task Task Local Address Space Local Address Space Task Shared Global Address Space ... Task Task Shared Global Address Space ... Task Task Shared Global Address Space ... Task ... Local Address Space Local Address Space Task Task Task Task ... Task Task Partitioned Shared Address Space Local Address Space Local Address Space Local Address Space X XX Y Z Array [ ] Task 1 Task 2 Task 3 Local Address Spaces Partitioned Shared Address Space Each task has declared a private variable X Task 1 has declared another private variable Y Task 3 has declared a shared variable Z An array is declared as shared across the shared address space Every task can access variable Z Every task can access each element of the array Only Task 1 can access variable Y Each copy of X is local to the task declaring it and may not necessarily contain the same value Access of elements local to a task in the array is faster than accessing other elements. Task 3 may access Z faster than Task 1 and Task 2 Shared Distributed PartitionedGlobalAddressSpace Hybrid SharedMemory Implementation DistributedMemory Implementation
  • 5. Idioms of Parallel Computing Common Task Language Chapel X10 Fortress Data parallel computation forall finish … for … async for Data distribution dmapped DistArray arrays, vectors, matrices Asynchronous Remote Tasks on … begin at … async spawn … at Nested parallelism cobegin … forall for … async for … spawn Remote transactions on … atomic (not implemented yet) at … atomic at … atomic 3/11/2013 PART OF QUALIFIER PRESENTATION 5
  • 6. Data Parallel Computation 3/11/2013 PART OF QUALIFIER PRESENTATION 6 forall (a,b,c) in zip (A,B,C) do a = b + alpha * c; forall i in 1 … N do a(i) = b(i); [i in 1 … N] a(i) = b(i); A = B + alpha * C; writeln(+ reduce [i in 1 .. 10] i**2;) for (p in A) A(p) = 2 * A(p); for ([i] in 1 .. N) sum += i; finish for (p in A) async A(p) = 2 * A(p); for i <- 1:10 do A[i] := i end A:ZZ32[3,3]=[1 2 3;4 5 6;7 8 9] for (i,j) <- A.indices() do A[i,j] := i end for a <- A do println(a) end for a <- {[ZZ32] 1,3,5,7,9} do println(a) end end for i <- sequential(1:10) do A[i] := i end for a <- sequential({[ZZ32] 1,3,10,8,6}) do println(a) end end Chapel X10 Fortress Zipper Arithmetic domain Short Forms StatementContextExpressionContext SequentialParallel Array Number Range ParallelSequential Array Indices Array Elements Number Range Set
  • 7. Data Distribution 3/11/2013 PART OF QUALIFIER PRESENTATION 7 Chapel X10 Fortress Domain and Array var D: domain(2) = [1 .. m, 1 .. n]; var A: [D] real; const D = [1..n, 1..n]; const BD = D dmapped Block(boundingBox=D); var BA: [BD] real; Box Distribution of Domain val R = (0..5) * (1..3); val arr = new Array[Int](R,10); Region and Array val blk = Dist.makeBlock((1..9)*(1..9)); val data : DistArray[Int]= DistArray.make[Int](blk, ([i,j]:Point(2)) => i*j); Box Distribution of Array Intended ◦ blocked ◦ blockCyclic ◦ columnMajor ◦ rowMajor ◦ Default No Working Implementation
  • 8. Asynchronous Remote Tasks 3/11/2013 PART OF QUALIFIER PRESENTATION 8 Chapel X10 Fortress Asynchronous Remote and Asynchronous • at (p) async S migrates the computation to p and spawns a new activity in p to evaluate S and returns control • async at (p) S spawns a new activity in current place and returns control while the spawned activity migrates the computation to p and evaluates S there • async at (p) async S spawns a new activity in current place and returns control while the spawned activity migrates the computation to p and spawns another activity in p to evaluate S there begin writeline(“Hello”); writeline(“Hi”); on A[i] do begin A[i] = 2 * A[i] writeline(“Hello”); writeline(“Hi”); { // activity T async {S1;} // spawns T1 async {S2;} // spawns T2 } Asynchronous Remote and Asynchronous (v,w) := (exp1, at a.region(i) do exp2 end) spawn at a.region(i) do exp end do v := exp1 at a.region(i) do w := exp2 end x := v+w end Remote and Asynchronous Implicit Multiple Threads and Region Shift Implicit Thread Group and Region Shift
  • 9. Nested Parallelism 3/11/2013 PART OF QUALIFIER PRESENTATION 9 Chapel X10 Fortress Data Parallelism Inside Task Parallelism cobegin { forall (a,b,c) in (A,B,C) do a = b + alpha * c; forall (d,e,f) in (D,E,F) do d = e + beta * f; } sync forall (a) in (A) do if (a % 5 ==0) then begin f(a); else a = g(a); Task Parallelism Inside Data Parallelism finish { async S1; async S2; } Data Parallelism Inside Task Parallelism Given a data parallel code in X10 it is possible to spawn new activities inside the body that gets evaluated in parallel. However, in the absence of a built-in data parallel construct, a scenario that requires such nesting may be custom implemented with constructs like finish, for, and async instead of first having to make data parallel code and embedding task parallelism Note on Task Parallelism Inside Data Parallelism T:Thread[Any] = spawn do exp end T.wait() do exp1 also do exp2 end Explicit Thread Structural Construct Data Parallelism Inside Task Parallelism arr:Array[ZZ32,ZZ32]=array[ZZ32](4).fill(id) for i <- arr.indices() do t = spawn do arr[i]:= factorial(i) end t.wait() end Note on Task Parallelism Inside Data Parallelism
  • 10. Remote Transactions 3/11/2013 PART OF QUALIFIER PRESENTATION 10 X10 Fortress def pop() : T { var ret : T; when(size>0) { ret = list.removeAt(0); size --; } return ret; } var n : Int = 0; finish { async atomic n = n + 1; //(a) async atomic n = n + 2; //(b) } var n : Int = 0; finish { async n = n + 1; //(a) -- BAD async atomic n = n + 2; //(b) } Unconditional Local Conditional Local val blk = Dist.makeBlock((1..1)*(1..1),0); val data = DistArray.make[Int](blk, ([i,j]:Point(2)) => 0); val pt : Point = [1,1]; finish for (pl in Place.places()) { async{ val dataloc = blk(pt); if (dataloc != pl){ Console.OUT.println("Point " + pt + " is in place " + dataloc); at (dataloc) atomic { data(pt) = data(pt) + 1; } } else { Console.OUT.println("Point " + pt + " is in place " + pl); atomic data(pt) = data(pt) + 2; } } } Console.OUT.println("Final value of point " + pt + " is " + data(pt)); Unconditional Remote The atomicity is weak in the sense that an atomic block appears atomic only to other atomic blocks running at the same place. Atomic code running at remote places or non-atomic code running at local or remote places may interfere with local atomic code, if care is not taken do x:Z32 := 0 y:Z32 := 0 z:Z32 := 0 atomic do x += 1 y += 1 also atomic do z := x + y end z end Local f(y:ZZ32):ZZ32=y y D:Array[ZZ32,ZZ32]=array[ZZ32](4).fill(f) q:ZZ32=0 at D.region(2) atomic do println("at D.region(2)") q:=D[2] println("q in first atomic: " q) also at D.region(1) atomic do println("at D.region(1)") q+=1 println("q in second atomic: " q) end println("Final q: " q) Remote (true if distributions were implemented)
  • 11. K-Means Implementation Why K-Means? ◦ Simple to Comprehend ◦ Broad Enough to Exploit Most of the Idioms Distributed Parallel Implementations ◦ Chapel and X10 Parallel Non Distributed Implementation ◦ Fortress Complete Working Code in Appendix of Paper 3/11/2013 PART OF QUALIFIER PRESENTATION 11
  • 12. 3/11/2013 PART OF QUALIFIER PRESENTATION 12 Thank you!