6. Caution: Performance benefits visible only around several
Thousand elements in the collection
Machine Architecture
Depends on
JVM vendor and version
Per element workload
Specific collection – ParArray, ParTrieMap
Specific operation – transformer(filter), accessor (foreach)
Memory Management
7. map, fold and filter
scala> val parArray = (1 to 1000000).toArray.par
scala> parArray.fold(0)(_+_)
res3: Int = 1784293664
scala> val narArray = (1 to 1000000).toArray
scala> narArray.fold(0)(_+_)
I did not notice
res5: Int = 1784293664
Difference on my
laptop
scala> parArray.fold(0)(_+_)
res6: Int = 1784293664
8. creating a parallel collection
import scala.collection.parallel.immutable.ParVector
With a new
val pv = new ParVector[Int]
val pv = Vector(1,2,3,4,5,6,7,8,9).par
Taking a sequential collection
And converting it
Parallel collections can be converted back to sequential collections with seq
9. Collections are inherently sequential
They are converted to || by copying elements into similar parallel collection
An example is List– it’s converted into a standard immutable parallel
sequence, which is a ParVector.
Overhead!
Array, Vector, HashMap do not have this overhead
10. how does it work?
Map reduce ?
by recursively “splitting” a given collection, applying an operation on each partition
of the collection in parallel, and re-“combining” all of the results that were completed
in parallel.
Side effecting operations Non Associative operations
11. scala> var sum =0 side effecting operation
sum: Int = 0
scala> val list = (1 to 1000).toList.par
scala> list.foreach(sum += _); sum
res7: Int = 452474
scala> var sum =0
sum: Int = 0
scala> list.foreach(sum += _); sum
res8: Int = 497761
scala> var sum =0
sum: Int = 0
scala> list.foreach(sum += _); sum
res9: Int = 422508
12. non-associative operations
The order in which function is applied to the elements of the collection can
be arbitrary
scala> val list = (1 to 1000).toList.par
scala> list.reduce(_-_)
res01: Int = -228888
scala> list.reduce(_-_)
res02: Int = -61000
scala> list.reduce(_-_)
res03: Int = -331818
13. associate but non-commutative
scala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").par
strings: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz)
scala> val alphabet = strings.reduce(_++_)
alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz
14. out of order?
Operations may be out of order
BUT
Recombination of results would be in order
C
collection A
A B C B
A B C
15. performance
In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where
the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node;
instead, its position in the tree defines the key with which it is associated.
16. conversions
List is converted to
vector
Converting parallel to sequential takes constant time
17. architecture
splitters combiners
Split the collection into Is a Builder.
Non-trivial partitions so Combines split lists together.
That they can be accessed
in sequence
18. brickbats
Absence of configuration
Not all algorithms are parallel friendly
unproven
Now, if you want your code to not care whether it receives a
parallel or sequential collection, you should prefix it with
Gen: GenTraversable, GenIterable, GenSeq, etc.
These can be either parallel or sequential.