SlideShare a Scribd company logo
1 of 50
Compilers Are Databases
JVM Languages Summit
Martin Odersky
TypeSafe and EPFL
Compilers...
2
Compilers and Data Bases
3
Compilers are Data Bases?
4
Put a square peg in a round
hole?
This Talk ...
... reports on a new compiler architecture for dsc,
the Dotty Scala Compiler.
• It has a mostly functional architecture,
but uses a lot of low-level tricks for speed.
• Some of its concepts are inspired by
functional databases.
My Early Involvement in Compilers
80s Pascal, Modula-2
single pass, following the school of Niklaus Wirth.
95-96 Espresso, the 2nd Java compiler
 E Compiler
 Borland’s JBuilder
used an OO AST with one class per node
and all processing distributed between
methods on these nodes.
96-99 Pizza  GJ  javac (1.3+) -> scalac (1.x)
replaced OO AST with pattern matching.
6
Current Scala Compiler
2004-12 nsc compiler for Scala (2.0-2.10)
Made (some) use of functional capabilities of Scala
Added:
– REPL
– presentation compiler for IDEs (Eclipse, Ensime)
– run-time meta programming with toolboxes
It’s the codebase for the official scalac compiler for 2.11,
2.12 and beyond.
7
Next Generation Scala Compiler
2012 – now: Dotty
• Rethink compiler architecture
from the ground up.
• Introduce some language
changes with the aim of better
regularity.
• Status:
– Close to bootstrap
– But still rough around the edges
8
Compilers – Traditional View
9
Compilers – Traditional View
10
Add Separate Compilation
11
Challenges
A compiler for a language like Scala faces quite a few
challenges.
Among the most important are:
» Complexity
» Speed
» Latency
» Reusability
Challenge: Complex Transformations
• Input language (Scala) is complicated.
• Output language (JVM) is also complicated.
• Semantic gap between the two is large.
Compare with compilers to simple low-level languages
such as System F or SSA.
13
Deep Transformation Pipeline
Parser
Typer
FirstTransform
ValueClasses
Mixin
LazyVals
Memoize
CapturedVars
Constructors
LambdaLift
Flatten
ElimStaticThis
RestoreScopes
GenBCode
Source
Bytecode
RefChecks
ElimRepeated
NormalizeFlags
ExtensionMethods
TailRec
PatternMatcher
ExplicitOuter
ExpandSAMs
Splitter
SeqLiterals
InterceptedMeths
Literalize
Getters
ClassTags
ElimByName
AugmentS2Traits
ResolveSuper
Erasure
To achieve reliability, need
– excellent modularity
– minimized side effects
 Functional code rules!
Challenge: Speed
• Current scalac achieves 500-700 loc/sec on idiomatic
Scala code.
• Can be much lower, depending on input.
• Everyone would like it to be faster.
• But this is very hard to achieve.
- FP does have costs.
- Optimizations are ineffective.
- No hotspots, costs are
smeared out widely.
15
Challenge: Latency
• Some applications require fast turnaround for small
changes more than high throughput.
• Examples:
– REPL
– Worksheet
– IDE Presentation Compiler
 Need to keep things loaded
(program + data)
16
Challenge: Reusability
• A compiler has many clients:
– Command line
– Build tools
– IDEs
– REPL
– Meta-programming
 Abstractions must not leak.
(FP helps)
17
A Question
Every compiler has to answer questions like this:
Say I have a class
class C[T] {
def f(x: T): T = ...
}
At some point I change it to:
class C[T] {
def f(x: T)(y: T): T = ...
}
What is the type signature of C.f?
Clearly, it depends on the time when the question is asked!
18
Time-Varying Answers
Initially: (x: T): T
After erasure: (x: Any): Any
After the edit: (x: T)(y: T): T
After uncurry: (x: T, y: T): T
After erasure: (x: Any, y: Any): Any
19
Naive Functional Approach
World1  IR1,1  ...  IRn,1  Output1
World2  IR1,2  ...  IRn,2  Output2
.
.
.
Worldk  IR1,k  ...  IRn,k  Outputk
How big is the world?
20
A More Practical Strategy
Taking Inspiration from FRP and Functional Databases:
• Treat every value as a time-varying function.
• So the question is not:
“What is the signature of C.f” ?
but:
“What is the signature of C.f at a given point in time” ?
 Need to index every piece of information with the time
where it holds.
21
Time in dsc
Period = (RunID, PhaseID)
• RunIDs is incremented for each compiler run
• PhaseID ranges from 1 (parser) to ~ 50 (backend)
22
Run1 Run2 Run3
Time-Indexed Values
sig(C.f, (Run 1, parser)) = (x: T): T
sig(C.f, (Run 1, erasure)) = (x: Any): Any
sig(C.f, (Run 2, erasure)) = (x: T)(y: T): T
sig(C.f, (Run 2, uncurry)) = (x: T, y: T): T
sig(C.f, (Run 2, erasure) = (x: Any, y: Any): Any
23
Task of the Compiler
• Compute all values needed for analysis and code
generation over all periods where they are relevant.
• Problem: The graph of this function is humongous!
• More work is needed to make it efficiently explorable.
• But for a start it looks like the right model.
24
Core Data Types
Abstract Syntax Trees
Types
References
Denotations
Symbols
25
Abstract Syntax Trees
• For instance, for x * 2:
26
Tree Attributes
What about tree attributes?
In dsc, we simplified as much as we could.
Were left with just two attributes:
– Position (intrinsic)
– Type
The job of the type checker is to transform untyped to typed
trees.
27
Typed Abstract Syntax Trees
28
For instance, for x * 2:
The distinction whether a tree is typed or untyped is pretty
important, merits being reflected in the type of AST itself.
From Untyped to Typed Trees
Idea: parameterize the type Tree of AST’s with the attribute
info it carries.
Typed tree: tpd.Tree = Tree[Type]
Untyped tree: untpd.Tree = Tree[Nothing]
This leads to the following class:
class Tree[T] {
def tpe: T
def withType(t: Type): Tree[Type]
}
29
Question of Variance
• Question: Which of the following two subtype
relationships should hold?
tpd.Tree <: untpd.Tree
untpd.Tree <: tpd.Tree ?
• What is the more useful relationship?
(the first)
• What relationship do the variance
rules imply?
(the second) 30
class Tree[? T] {
def tpe: T ...
}
Fixing class Tree
class Tree[-T] {
def tpe: T @uncheckedVariance
def withType(t: Type): Tree[Type]
}
Interesting exception to the variance rules related to the
bottom type Nothing.
What can go “wrong” here? Given an untpd.Tree, I expect Nothing,
but I might get a Type.
Shows that it’s good have an escape hatch in the form of
@uncheckedVariance.
31
Types
• Types carry most of the essential information of trees
and symbols.
• Two kinds of types.
– Value types: Int, Int => Int, (Boolean, String)
– Types of definitions: (x: Int)Int, Lo..Hi, Class(...)
• Represented as subtypes of the same type “Type” for
convenience.
32
References
case class Select(qual: Tree, name: Name) {
// what is its tpe?
}
case class Ident(name: Name) {
// what is its tpe?
}
• Normally, these tree nodes would carry a “symbol”,
which acts as a reference to some definition.
• But there are no symbol attributes in dsc, for good
reason.
33
Traditional Scheme
34
That’s not very functional!
A Question of Meaning
Question: What is the meaning of
obj.fun
?
It depends on the period!
Does that mean that obj.fun has different types,
depending on period?
No, trees are immutable!
35
References
36
• A reference is a type
• It contains (only)
– a name
– potentially a prefix
• References
are immutable, they
exist forever.
What about Overloads?
The name of a TermRef may be shared by several
overloaded members of a class.
How do we determine which member is meant?
(In a nutshell, that’s why overloading is so universally hated
by compiler writers)
Trick: Allow “signature” as part of term names.
37
What Does A Reference Reference?
Surely, a symbol?
No!
References capture more than a symbol
And sometimes they do not refer to a unique
symbol at all.
38
References capture more than a symbol.
Consider:
class C[T] {
def f(x: T): T
}
val prefix = new C[Int]
Then prefix.f:
resolves to C’s f
but at type (Int)Int, not (T)T
Both pieces of information are part of the meaning of
prefix.f. 39
References
Sometimes references point to no symbol at all.
We have already seen overloading.
Here’s another example using union types, which are newly
supported by dsc:
class A { def f: Int }
class B { def f: Int }
val prefix: A | B = if (...) new A else new B
prefix.f
What symbol is referenced by prefix.f ?
40
Denotations
The meaning
of a reference is a denotation.
Non-overloaded denotations
carry symbols (maybe) and
types (always).
41
What Then Is A Symbol?
A symbol represents a declaration in some source
file.
It “lives” as long as the source file is unchanged.
It has a denotation depending on the period.
42
Denotation Transformers
• How do we compute new denotations from old ones?
• For references pre.f: Can recompute the member at
new phase.
• For symbols?
uncurry.transDenot(<(x: A)(y: B): C>) = <(x: A, y: B): C>
43
Caching Denotations
Symbols are memoized functions: Period  Denotation
Keep all denotations of a symbol at different phases as a
ring. 44
Putting it all Together
45
• ER diagram of core compiler architecture:
*
*
Lessons Learned
(Not done yet, still learning)
• Think databases for modeling.
• Think FP for transformations.
• Get efficiency through low-level techniques
(caching)
• But take care not to compromise the high-level
semantics.
46
To Find Out More
47
How to make it Fast
• Caching
– Symbols cache last denotation
– NamedTypes do the same
– Caches are stamped with validity interval (current period until the
next denotation transformer kicks in).
– Need to update only if outside of validity period
– Member lookup caches denotation
Not yet tried: Parallelization.
- Could be hard (similar to chess programs)
48
Many forms of Caches
• Lazy vals
• Memoization
• LRU Caches
• Rely on
– Purely functional semantics
– Access to low-level imperative implementation code.
– Important to keep the levels of abstractions apart!
49
Optimization: Phase Fusion
• For modularity reasons, phases should be small. Each
phase should od one self-contained transform.
• But that means we end up with many phases.
• Problem: Repeated tree rewriting is a performance killer.
• Solution: Automatically fuse phases into one tree
traversal.
– Relies on design pattern and some small amount of
introspection.
50

More Related Content

What's hot

Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
Java WebStart Is Dead: What Should We Do Now?
Java WebStart Is Dead: What Should We Do Now?Java WebStart Is Dead: What Should We Do Now?
Java WebStart Is Dead: What Should We Do Now?Hendrik Ebbers
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwJan Holčapek
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsShubham Tagra
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
プログラマ目線から見たRDMAのメリットと その応用例について
プログラマ目線から見たRDMAのメリットとその応用例についてプログラマ目線から見たRDMAのメリットとその応用例について
プログラマ目線から見たRDMAのメリットと その応用例についてMasanori Itoh
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark InternalsKnoldus Inc.
 
CDC Stream Processing with Apache Flink
CDC Stream Processing with Apache FlinkCDC Stream Processing with Apache Flink
CDC Stream Processing with Apache FlinkTimo Walther
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기beom kyun choi
 
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...InfluxData
 
PostgreSQL - C言語によるユーザ定義関数の作り方
PostgreSQL - C言語によるユーザ定義関数の作り方PostgreSQL - C言語によるユーザ定義関数の作り方
PostgreSQL - C言語によるユーザ定義関数の作り方Satoshi Nagayasu
 
Reverse eningeering
Reverse eningeeringReverse eningeering
Reverse eningeeringKent Huang
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseDataWorks Summit
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.Naoto MATSUMOTO
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Performance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache SparkPerformance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache SparkDataWorks Summit
 

What's hot (20)

Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Java WebStart Is Dead: What Should We Do Now?
Java WebStart Is Dead: What Should We Do Now?Java WebStart Is Dead: What Should We Do Now?
Java WebStart Is Dead: What Should We Do Now?
 
Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdw
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
プログラマ目線から見たRDMAのメリットと その応用例について
プログラマ目線から見たRDMAのメリットとその応用例についてプログラマ目線から見たRDMAのメリットとその応用例について
プログラマ目線から見たRDMAのメリットと その応用例について
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
 
CDC Stream Processing with Apache Flink
CDC Stream Processing with Apache FlinkCDC Stream Processing with Apache Flink
CDC Stream Processing with Apache Flink
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기
 
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
 
PostgreSQL - C言語によるユーザ定義関数の作り方
PostgreSQL - C言語によるユーザ定義関数の作り方PostgreSQL - C言語によるユーザ定義関数の作り方
PostgreSQL - C言語によるユーザ定義関数の作り方
 
Reverse eningeering
Reverse eningeeringReverse eningeering
Reverse eningeering
 
Seamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive WarehouseSeamless replication and disaster recovery for Apache Hive Warehouse
Seamless replication and disaster recovery for Apache Hive Warehouse
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Ui disk & terminal drivers
Ui disk & terminal driversUi disk & terminal drivers
Ui disk & terminal drivers
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Performance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache SparkPerformance Update: When Apache ORC Met Apache Spark
Performance Update: When Apache ORC Met Apache Spark
 

Similar to Compilers Are Databases

Martin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaScala Italy
 
First Class Variables as AST Annotations
 First Class Variables as AST Annotations First Class Variables as AST Annotations
First Class Variables as AST AnnotationsESUG
 
First Class Variables as AST Annotations
First Class Variables as AST AnnotationsFirst Class Variables as AST Annotations
First Class Variables as AST AnnotationsMarcus Denker
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San FranciscoMartin Odersky
 
Archi Modelling
Archi ModellingArchi Modelling
Archi Modellingdilane007
 
Scala for Machine Learning
Scala for Machine LearningScala for Machine Learning
Scala for Machine LearningPatrick Nicolas
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingPositive Hack Days
 
Threads and multi threading
Threads and multi threadingThreads and multi threading
Threads and multi threadingAntonio Cesarano
 
QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...Scality
 
Metrics ekon 14_2_kleiner
Metrics ekon 14_2_kleinerMetrics ekon 14_2_kleiner
Metrics ekon 14_2_kleinerMax Kleiner
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
Lecture1_computer vision-2023.pdf
Lecture1_computer vision-2023.pdfLecture1_computer vision-2023.pdf
Lecture1_computer vision-2023.pdfssuserff72e4
 
Type Profiler: Ambitious Type Inference for Ruby 3
Type Profiler: Ambitious Type Inference for Ruby 3Type Profiler: Ambitious Type Inference for Ruby 3
Type Profiler: Ambitious Type Inference for Ruby 3mametter
 

Similar to Compilers Are Databases (20)

Scala Days NYC 2016
Scala Days NYC 2016Scala Days NYC 2016
Scala Days NYC 2016
 
Martin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of Scala
 
First Class Variables as AST Annotations
 First Class Variables as AST Annotations First Class Variables as AST Annotations
First Class Variables as AST Annotations
 
First Class Variables as AST Annotations
First Class Variables as AST AnnotationsFirst Class Variables as AST Annotations
First Class Variables as AST Annotations
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
 
Archi Modelling
Archi ModellingArchi Modelling
Archi Modelling
 
Scala for Machine Learning
Scala for Machine LearningScala for Machine Learning
Scala for Machine Learning
 
Csharp
CsharpCsharp
Csharp
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash Cracking
 
MatlabIntro (1).ppt
MatlabIntro (1).pptMatlabIntro (1).ppt
MatlabIntro (1).ppt
 
Threads and multi threading
Threads and multi threadingThreads and multi threading
Threads and multi threading
 
QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...
 
Matlab lec1
Matlab lec1Matlab lec1
Matlab lec1
 
Metrics ekon 14_2_kleiner
Metrics ekon 14_2_kleinerMetrics ekon 14_2_kleiner
Metrics ekon 14_2_kleiner
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Lecture1_computer vision-2023.pdf
Lecture1_computer vision-2023.pdfLecture1_computer vision-2023.pdf
Lecture1_computer vision-2023.pdf
 
Type Profiler: Ambitious Type Inference for Ruby 3
Type Profiler: Ambitious Type Inference for Ruby 3Type Profiler: Ambitious Type Inference for Ruby 3
Type Profiler: Ambitious Type Inference for Ruby 3
 
Modern C++
Modern C++Modern C++
Modern C++
 

More from Martin Odersky

Capabilities for Resources and Effects
Capabilities for Resources and EffectsCapabilities for Resources and Effects
Capabilities for Resources and EffectsMartin Odersky
 
What To Leave Implicit
What To Leave ImplicitWhat To Leave Implicit
What To Leave ImplicitMartin Odersky
 
What To Leave Implicit
What To Leave ImplicitWhat To Leave Implicit
What To Leave ImplicitMartin Odersky
 
Implementing Higher-Kinded Types in Dotty
Implementing Higher-Kinded Types in DottyImplementing Higher-Kinded Types in Dotty
Implementing Higher-Kinded Types in DottyMartin Odersky
 
The Evolution of Scala
The Evolution of ScalaThe Evolution of Scala
The Evolution of ScalaMartin Odersky
 
Scala - The Simple Parts, SFScala presentation
Scala - The Simple Parts, SFScala presentationScala - The Simple Parts, SFScala presentation
Scala - The Simple Parts, SFScala presentationMartin Odersky
 
flatMap Oslo presentation slides
flatMap Oslo presentation slidesflatMap Oslo presentation slides
flatMap Oslo presentation slidesMartin Odersky
 
Oscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simpleOscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simpleMartin Odersky
 
Scala eXchange opening
Scala eXchange openingScala eXchange opening
Scala eXchange openingMartin Odersky
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Martin Odersky
 

More from Martin Odersky (17)

scalar.pdf
scalar.pdfscalar.pdf
scalar.pdf
 
Capabilities for Resources and Effects
Capabilities for Resources and EffectsCapabilities for Resources and Effects
Capabilities for Resources and Effects
 
Preparing for Scala 3
Preparing for Scala 3Preparing for Scala 3
Preparing for Scala 3
 
Simplicitly
SimplicitlySimplicitly
Simplicitly
 
What To Leave Implicit
What To Leave ImplicitWhat To Leave Implicit
What To Leave Implicit
 
What To Leave Implicit
What To Leave ImplicitWhat To Leave Implicit
What To Leave Implicit
 
From DOT to Dotty
From DOT to DottyFrom DOT to Dotty
From DOT to Dotty
 
Implementing Higher-Kinded Types in Dotty
Implementing Higher-Kinded Types in DottyImplementing Higher-Kinded Types in Dotty
Implementing Higher-Kinded Types in Dotty
 
Scalax
ScalaxScalax
Scalax
 
The Evolution of Scala
The Evolution of ScalaThe Evolution of Scala
The Evolution of Scala
 
Scala - The Simple Parts, SFScala presentation
Scala - The Simple Parts, SFScala presentationScala - The Simple Parts, SFScala presentation
Scala - The Simple Parts, SFScala presentation
 
Flatmap
FlatmapFlatmap
Flatmap
 
flatMap Oslo presentation slides
flatMap Oslo presentation slidesflatMap Oslo presentation slides
flatMap Oslo presentation slides
 
Devoxx
DevoxxDevoxx
Devoxx
 
Oscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simpleOscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simple
 
Scala eXchange opening
Scala eXchange openingScala eXchange opening
Scala eXchange opening
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 

Recently uploaded

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Compilers Are Databases

  • 1. Compilers Are Databases JVM Languages Summit Martin Odersky TypeSafe and EPFL
  • 4. Compilers are Data Bases? 4 Put a square peg in a round hole?
  • 5. This Talk ... ... reports on a new compiler architecture for dsc, the Dotty Scala Compiler. • It has a mostly functional architecture, but uses a lot of low-level tricks for speed. • Some of its concepts are inspired by functional databases.
  • 6. My Early Involvement in Compilers 80s Pascal, Modula-2 single pass, following the school of Niklaus Wirth. 95-96 Espresso, the 2nd Java compiler  E Compiler  Borland’s JBuilder used an OO AST with one class per node and all processing distributed between methods on these nodes. 96-99 Pizza  GJ  javac (1.3+) -> scalac (1.x) replaced OO AST with pattern matching. 6
  • 7. Current Scala Compiler 2004-12 nsc compiler for Scala (2.0-2.10) Made (some) use of functional capabilities of Scala Added: – REPL – presentation compiler for IDEs (Eclipse, Ensime) – run-time meta programming with toolboxes It’s the codebase for the official scalac compiler for 2.11, 2.12 and beyond. 7
  • 8. Next Generation Scala Compiler 2012 – now: Dotty • Rethink compiler architecture from the ground up. • Introduce some language changes with the aim of better regularity. • Status: – Close to bootstrap – But still rough around the edges 8
  • 12. Challenges A compiler for a language like Scala faces quite a few challenges. Among the most important are: » Complexity » Speed » Latency » Reusability
  • 13. Challenge: Complex Transformations • Input language (Scala) is complicated. • Output language (JVM) is also complicated. • Semantic gap between the two is large. Compare with compilers to simple low-level languages such as System F or SSA. 13
  • 15. Challenge: Speed • Current scalac achieves 500-700 loc/sec on idiomatic Scala code. • Can be much lower, depending on input. • Everyone would like it to be faster. • But this is very hard to achieve. - FP does have costs. - Optimizations are ineffective. - No hotspots, costs are smeared out widely. 15
  • 16. Challenge: Latency • Some applications require fast turnaround for small changes more than high throughput. • Examples: – REPL – Worksheet – IDE Presentation Compiler  Need to keep things loaded (program + data) 16
  • 17. Challenge: Reusability • A compiler has many clients: – Command line – Build tools – IDEs – REPL – Meta-programming  Abstractions must not leak. (FP helps) 17
  • 18. A Question Every compiler has to answer questions like this: Say I have a class class C[T] { def f(x: T): T = ... } At some point I change it to: class C[T] { def f(x: T)(y: T): T = ... } What is the type signature of C.f? Clearly, it depends on the time when the question is asked! 18
  • 19. Time-Varying Answers Initially: (x: T): T After erasure: (x: Any): Any After the edit: (x: T)(y: T): T After uncurry: (x: T, y: T): T After erasure: (x: Any, y: Any): Any 19
  • 20. Naive Functional Approach World1  IR1,1  ...  IRn,1  Output1 World2  IR1,2  ...  IRn,2  Output2 . . . Worldk  IR1,k  ...  IRn,k  Outputk How big is the world? 20
  • 21. A More Practical Strategy Taking Inspiration from FRP and Functional Databases: • Treat every value as a time-varying function. • So the question is not: “What is the signature of C.f” ? but: “What is the signature of C.f at a given point in time” ?  Need to index every piece of information with the time where it holds. 21
  • 22. Time in dsc Period = (RunID, PhaseID) • RunIDs is incremented for each compiler run • PhaseID ranges from 1 (parser) to ~ 50 (backend) 22 Run1 Run2 Run3
  • 23. Time-Indexed Values sig(C.f, (Run 1, parser)) = (x: T): T sig(C.f, (Run 1, erasure)) = (x: Any): Any sig(C.f, (Run 2, erasure)) = (x: T)(y: T): T sig(C.f, (Run 2, uncurry)) = (x: T, y: T): T sig(C.f, (Run 2, erasure) = (x: Any, y: Any): Any 23
  • 24. Task of the Compiler • Compute all values needed for analysis and code generation over all periods where they are relevant. • Problem: The graph of this function is humongous! • More work is needed to make it efficiently explorable. • But for a start it looks like the right model. 24
  • 25. Core Data Types Abstract Syntax Trees Types References Denotations Symbols 25
  • 26. Abstract Syntax Trees • For instance, for x * 2: 26
  • 27. Tree Attributes What about tree attributes? In dsc, we simplified as much as we could. Were left with just two attributes: – Position (intrinsic) – Type The job of the type checker is to transform untyped to typed trees. 27
  • 28. Typed Abstract Syntax Trees 28 For instance, for x * 2: The distinction whether a tree is typed or untyped is pretty important, merits being reflected in the type of AST itself.
  • 29. From Untyped to Typed Trees Idea: parameterize the type Tree of AST’s with the attribute info it carries. Typed tree: tpd.Tree = Tree[Type] Untyped tree: untpd.Tree = Tree[Nothing] This leads to the following class: class Tree[T] { def tpe: T def withType(t: Type): Tree[Type] } 29
  • 30. Question of Variance • Question: Which of the following two subtype relationships should hold? tpd.Tree <: untpd.Tree untpd.Tree <: tpd.Tree ? • What is the more useful relationship? (the first) • What relationship do the variance rules imply? (the second) 30 class Tree[? T] { def tpe: T ... }
  • 31. Fixing class Tree class Tree[-T] { def tpe: T @uncheckedVariance def withType(t: Type): Tree[Type] } Interesting exception to the variance rules related to the bottom type Nothing. What can go “wrong” here? Given an untpd.Tree, I expect Nothing, but I might get a Type. Shows that it’s good have an escape hatch in the form of @uncheckedVariance. 31
  • 32. Types • Types carry most of the essential information of trees and symbols. • Two kinds of types. – Value types: Int, Int => Int, (Boolean, String) – Types of definitions: (x: Int)Int, Lo..Hi, Class(...) • Represented as subtypes of the same type “Type” for convenience. 32
  • 33. References case class Select(qual: Tree, name: Name) { // what is its tpe? } case class Ident(name: Name) { // what is its tpe? } • Normally, these tree nodes would carry a “symbol”, which acts as a reference to some definition. • But there are no symbol attributes in dsc, for good reason. 33
  • 35. A Question of Meaning Question: What is the meaning of obj.fun ? It depends on the period! Does that mean that obj.fun has different types, depending on period? No, trees are immutable! 35
  • 36. References 36 • A reference is a type • It contains (only) – a name – potentially a prefix • References are immutable, they exist forever.
  • 37. What about Overloads? The name of a TermRef may be shared by several overloaded members of a class. How do we determine which member is meant? (In a nutshell, that’s why overloading is so universally hated by compiler writers) Trick: Allow “signature” as part of term names. 37
  • 38. What Does A Reference Reference? Surely, a symbol? No! References capture more than a symbol And sometimes they do not refer to a unique symbol at all. 38
  • 39. References capture more than a symbol. Consider: class C[T] { def f(x: T): T } val prefix = new C[Int] Then prefix.f: resolves to C’s f but at type (Int)Int, not (T)T Both pieces of information are part of the meaning of prefix.f. 39
  • 40. References Sometimes references point to no symbol at all. We have already seen overloading. Here’s another example using union types, which are newly supported by dsc: class A { def f: Int } class B { def f: Int } val prefix: A | B = if (...) new A else new B prefix.f What symbol is referenced by prefix.f ? 40
  • 41. Denotations The meaning of a reference is a denotation. Non-overloaded denotations carry symbols (maybe) and types (always). 41
  • 42. What Then Is A Symbol? A symbol represents a declaration in some source file. It “lives” as long as the source file is unchanged. It has a denotation depending on the period. 42
  • 43. Denotation Transformers • How do we compute new denotations from old ones? • For references pre.f: Can recompute the member at new phase. • For symbols? uncurry.transDenot(<(x: A)(y: B): C>) = <(x: A, y: B): C> 43
  • 44. Caching Denotations Symbols are memoized functions: Period  Denotation Keep all denotations of a symbol at different phases as a ring. 44
  • 45. Putting it all Together 45 • ER diagram of core compiler architecture: * *
  • 46. Lessons Learned (Not done yet, still learning) • Think databases for modeling. • Think FP for transformations. • Get efficiency through low-level techniques (caching) • But take care not to compromise the high-level semantics. 46
  • 47. To Find Out More 47
  • 48. How to make it Fast • Caching – Symbols cache last denotation – NamedTypes do the same – Caches are stamped with validity interval (current period until the next denotation transformer kicks in). – Need to update only if outside of validity period – Member lookup caches denotation Not yet tried: Parallelization. - Could be hard (similar to chess programs) 48
  • 49. Many forms of Caches • Lazy vals • Memoization • LRU Caches • Rely on – Purely functional semantics – Access to low-level imperative implementation code. – Important to keep the levels of abstractions apart! 49
  • 50. Optimization: Phase Fusion • For modularity reasons, phases should be small. Each phase should od one self-contained transform. • But that means we end up with many phases. • Problem: Repeated tree rewriting is a performance killer. • Solution: Automatically fuse phases into one tree traversal. – Relies on design pattern and some small amount of introspection. 50