4. Shoulders of Giants
JVMJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. Rose
Hiro
Marcin
Nahi
Subbu Douglas
Christian Dmitry
Tom
Charlie
JRuby
5. All the stuff!
JVMJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. RoseJ. Rose
Garbage
Collection
Native JIT
Profiled
Optimizations
Native
Threading
Tooling Cross Platform
12. class Simple
attr_accessor :next
end
top = Simple.new
puts Benchmark.measure {
outer = 10
total = 100000
per = 100
outer.times do
total.times do
per.times { Simple.new }
s = Simple.new
top.next = s
top = s
end
end
}
29. Egg Madness
class EggMadnessPlugin
include Purugin::Plugin
description 'EggMadness', 0.1
def on_enable
event(:player_egg_throw) do |e|
e.hatching = true
e.num_hatches = 50
e.hatching_type = :chicken
end
end
end
36. Block Jitting
• JRuby 1.7 only jitted methods
• Not free-standing procs/lambdas
• Not define_method blocks
• Easier to do now with 9000's IR
• Blocks JIT now in 9.0.4.0
37. Jitting is Winning
Performance of define_method in loaded file
0k iters/s
750k iters/s
1500k iters/s
2250k iters/s
3000k iters/s
MRI JRuby 9.0.1.0 JRuby 9.0.4.0
normal method define_method method
ruby -e 'load "bench_define_method.rb"'
38. define_method
Convenient for metaprogramming,
but blocks have more overhead than methods.
define_method(:add) do |a, b|
a + b
end
names.each do |name|
define_method(name) { send :"do_#{name}" }
end
42. Reduced-cost Exceptions
• Backtrace cost isVERY high on JVM
• Heavily optimized, lots of work to build
• Exceptions frequently ignored
• ...or used as flow control (shame!)
• If ignored, backtrace is not needed!
43. Postfix Antipattern
foo rescue nil
Exception raised
StandardError rescued
Exception ignored
Result is simple expression, so exception is never visible.
44. csv.rb Converters
Converters = { integer: lambda { |f|
Integer(f.encode(ConverterEncoding)) rescue f
},
float: lambda { |f|
Float(f.encode(ConverterEncoding)) rescue f
},
...
All trivial rescues, no traces needed.
49. def foo(a, b)
c = 1
d = a + c
end
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
2 b = recv_pre_reqd_arg(1)
3 %block = recv_closure
4 thread_poll
5 line_num(1)
6 c = 1
7 line_num(2)
8 %v_0 = call(:+, a, [c])
9 d = copy(%v_0)
10 return(%v_0)
Register-based
3 address format
IR InstructionsSemantic
Analysis
50. -Xir.passes=LocalOptimizationPass,
DeadCodeElimination
def foo(a, b)
c = 1
d = a + c
end
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
2 b = recv_pre_reqd_arg(1)
3 %block = recv_closure
4 thread_poll
5 line_num(1)
6 c = 1
7 line_num(2)
8 %v_0 = call(:+, a, [c])
9 d = copy(%v_0)
10 return(%v_0)
Optimization
51. def foo(a, b)
c = 1
d = a + c
end
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
4 thread_poll
5 line_num(1)
6 c = 1
7 line_num(2)
8 %v_0 = call(:+, a, [c])
9 d = copy(%v_0)
10 return(%v_0)
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
4 thread_poll
5 line_num(1)
6 c =
7 line_num(2)
8 %v_0 = call(:+, a, [ ])
9 d = copy(%v_0)
10 return(%v_0)
1
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
4 thread_poll
5 line_num(1)
7 line_num(2)
8 %v_0 = call(:+, a, [1])
9 d = copy(%v_0)
10 return(%v_0)
Optimization -Xir.passes=LocalOptimizationPass,
DeadCodeElimination
52. 0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
4 thread_poll
5 line_num(1)
7 line_num(2)
8 %v_0 = call(:+, a, [1])
9 d = copy(%v_0)
10 return(%v_0)
0 check_arity(2, 0, -1)
1 a = recv_pre_reqd_arg(0)
4 thread_poll
7 line_num(2)
8 %v_0 = call(:+, a, [1])
9 d = copy(%v_0)
10 return(%v_0)
Optimization -Xir.passes=LocalOptimizationPass,
DeadCodeElimination
53. Inlining
• 500 pound gorilla of optimizations
• shove method/closure back to callsite
• eliminate stack frame
• eliminate parameter passing/return
• eliminate additional allocation
Optimization
54. Today’s Inliner
def decrement_one(i)
i - 1
end
i = 1_000_000
while i > 0
i = decrement_one(i)
end
def decrement_one(i)
i - 1
end
i = 1_000_000
while i < 0
if guard_same? self
i = i - 1
else
i = decrement_one(i)
end
end
55. Numeric Specialization
• Everything's an object
• JVM has only references and primitives
• Not compatible in bytecode
• Need to optimize numerics as primitive
56. def looper(n)
i = 0
while i < n
do_something(i)
i += 1
end
end
Cached object
Call with i
New Fixnum i + 1
Probably a Fixnum?
57. def looper(n)
i = 0
while i < n
do_something(i)
i += 1
end
end
def looper(long n)
long i = 0
while i < n
do_something(i)
i += 1
end
end
Specialize n, i to long
def looper(n)
i = 0
while i < n
do_something(i)
i += 1
end
end
Deopt to object version if n or i + 1 is not Fixnum
58. JVM Futures
• We're good friends with OpenJDK folks
• Working to improve JVM as well
• FFI being added at JVM level
• AOT compilation for startup perf
59. FFI in JVM
• Project Panama (JEP-191)
• Native support for FFI
• Code generators for binding
• JIT support for calling
• API support for userland
64. Startup Time
• By far our greatest challenge
• Everything starts cold: parser, interpreter,
compiler, core classes, boot logic
• Increasing amount of Ruby in JRuby
• Aggravates the problem
66. --dev
• Disables JRuby JIT
• Sets JVM to reduced optimization mode
• 50% reduction in startup time
• Much lower peak perf
67. JRuby --dev
C Ruby
JRuby
JRuby --dev
Time in seconds (lower is better)
0s 3.5s 7s 10.5s 14s
-e 1 gem --list rake -T in Rals app
68. AOT
• Precompile JVM bytecode to native
• Focus on hot code
• Save original structure for optimization
• Get JRuby running native right away
• AOT compile Ruby to native in future
69. Getting There
C Ruby
JRuby
JRuby --dev
Non-opto AOT
Opto AOT
Time in seconds (lower is better)
0s 3.5s 7s 10.5s 14s
rake -T in Rails app
70. AOT Future
• AOT might be available in Java 9
• Many tweaks we can make to help it
• Ideal: all code run at boot runs native
• Should get closer to MRI