At a time when Herbt Sutter announced to everyone that the free lunch is over (The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software), concurrency has become our everyday life.A big change is coming to Java, the Loom project and with it such new terms as "virtual thread", "continuations" and "structured concurrency". If you've been wondering what they will change in our daily work or
whether it's worth rewriting your Tomcat-based application to super-efficient reactive Netty,or whether to wait for Project Loom? This presentation is for you.
I will talk about the Loom project and the new possibilities related to virtual wattles and "structured concurrency". I will tell you how it works and what can be achieved and the impact on performance
25. Platform Thread
Platform Thread
~1ms to schedule
Big memory consumption
Expensive
OS Thread
Task-switching requires switch to kernel: ~100µs (depends on the OS)
Scheduling is a compromise for all usages. Bad cache locality
27. Virtual thread
Virtual thread
ThreadLighter threads
Less memory usage
Fastest blocking code*
No more platform threads is not GC root
CPU cache misses are possible
Pay-as-you-go stacks (size 200-300 bytes) stored in a heap
Scales to 1M+ on commodity hardware
Clean stack traces
Your old code just works
Readable sequential code
The natural unit of scheduling for operating systems
28. Virtual thread is cheap
Virtual thread is cheap
Cheap to create
Cheap to destroy
Cheap to block
29. Virtual thread purpose
Virtual thread purpose
mostly intended to write I/O application
servers
message brokers
higher concurrency if system needs additional resources for concurrency
available connections in a connection pool
sufficient memory to serve the increased load
increase efficiency for short cycle tasks
30. Virtual thread is not for
Virtual thread is not for
The non-realtime kernels primarily employ time-sharing when the CPU is at 100%
Run for a long time computation *
CPU bound tasks *
31. Virtual threads are not an execution resource, but a business logic object like a string.
32. Code
Code
final Thread platformThread = Thread
.ofPlatform()
.unstarted(() -> System.out.println("Hello from " + Thread.currentThread()));
final Thread virtualThread = Thread
.ofVirtual()
.unstarted(() -> System.out.println("Hello from " + Thread.currentThread()));
Hello from Thread[#22,Thread-0,5,main]
Hello from VirtualThread[#23]/runnable@ForkJoinPool-1-worker-1
33. Fast forward to today
Fast forward to today
Virtual thread = user mode thread
Scheduled by JVM, not OS
Virtual thread is a instance of java.lang.Thread
Platform thread is instance of java.lang.Thread but implemented by “traditional way”, thin wrapper
around OS thread
34. How are virtual threads implemented?
How are virtual threads implemented?
35. How are virtual threads implemented?
How are virtual threads implemented?
Built on continuations, as lower construct of VM
Wraps a task in continuation
FIFO mode
M:N thread model
36.
37. The virtual thread is not an atomic construct, but a composition of two concerns — a scheduler
and a continuation…
39. Delimited Continuous with Scheduler
Delimited Continuous with Scheduler
The ability to manipulate call stacks to the JVM will undoubtedly be required
41. Scheduler
Scheduler
A scheduler assigns continuations to CPU cores, replacing a paused one with another that’s ready to
run, and ensuring that a continuation that is ready to resume will eventually be assigned to a CPU
core.
42. VH wraps a task in a continuation
VH wraps a task in a continuation
45. Copy terminology
Copy terminology
Freeze: Suspend a continuation and unmount it by copying frames from OS thread stack →
continuation object
Thaw: Mount a suspended continuation by copying frames from continuation object → OS thread
stack
48. private static void enter(Continuation c, boolean isContinue) {
// This method runs in the "entry frame".
// A yield jumps to this method's caller as if returning from this method.
try {
c.enter0();
} finally {
c.finish();
}
}
private void enter0() {
target.run();
}
72. I/O
I/O
The java.nio.channels classes — SocketChannel, ServerSocketChannel and DatagramChannel — were
retrofitted to become virtual-thread-friendly. When their synchronous operations, such as read and
write, are performed on a virtual thread, only non-blocking I/O is used under the covers
"Old" I/O networking — java.net.Socket, ServerSocket and DatagramSocket — has been
reimplemented in Java on top of NIO, so it immediately benefits from NIO’s virtual-thread-
friendliness
DNS lookups by the getHostName, getCanonicalHostName, getByName methods of
java.net.InetAddress (and other classes that use them) are still delegated to the operating system,
which only provides a OS-thread-blocking API. Alternatives are being explored
73. I/O
I/O
Process pipes will similarly be made virtual-thread-friendly, except maybe on Windows, where this
requires a greater effort
Console I/O has also been retrofitted. Http(s)URLConnection and the implementation of TLS/SSL
were changed to rely on j.u.c locks and avoid pinning.
File I/O is problematic. Internally, the JDK uses buffered I/O for files, which always reports available
bytes even when a read will block. On Linux, we plan to use io_uring for asynchronous file I/O, and in
the meantime we’re using the ForkJoinPool.ManagedBlocker mechanism to smooth over blocking file
I/O operations by adding more OS threads to the worker pool when a worker is blocked.
76. bool JfrThreadSampleClosure::sample_thread_in_java(JavaThread* thread, JfrStackFrame* frames, u4 max_frames) {
// Process the oops in the thread head before calling into code that wants to
// stack walk over Loom continuations. The stack walking code will otherwise
// skip frames in stack chunks on the Java heap.
StackWatermarkSet::start_processing(thread, StackWatermarkKind::gc);
OSThreadSampler sampler(thread, *this, frames, max_frames);
sampler.take_sample();
}
https://github.com/openjdk/jdk/blob/d0761c19d1ddafbcb5ea97334335462e716de250/src/hotspot/share/jfr/
77. if (Continuation::is_return_barrier_entry(sender_pc)) {
// If our sender_pc is the return barrier, then our "real" sender is the continuation entry
frame s = Continuation::continuation_bottom_sender(thread, *this, sender_sp);
sender_sp = s.sp();
sender_pc = s.pc();
}
https://github.com/openjdk/jdk/blob/jdk-21%2B35/src/hotspot/cpu/x86/frame_x86.cpp#L158
78. — Andrei Pangin
Async-profiler works on the native threads level. Reconstruction of virtual stack traces is not its
goal at this point.
79. It makes async-profiler essentially incompatible with loom. If the problem was limited to
reconstruction of the chain of deliberately launched structured concurrency scopes, that would
probably not be an issue for most applications, but if any potentially yielding method can be
yanked onto a random carrier thread without any stack history, the flame graphs will be very
hard to interpret.
83. Structured concurrency treats groups of related tasks running in different threads as a single unit
of work, thereby streamlining error handling and cancellation, improving reliability, and
enhancing observability.
84.
85. Before JDK21
Before JDK21
Response handle() throws ExecutionException, InterruptedException {
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
Future<String> user = scope.fork(() -> findUser());
Future<Integer> order = scope.fork(() -> fetchOrder());
scope.join(); // Join both forks
scope.throwIfFailed(); // ... and propagate errors
// Here, both forks have succeeded, so compose their results
return new Response(user.resultNow(), order.resultNow());
}
}
86. JDK21
JDK21
Response handle() throws ExecutionException, InterruptedException {
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
java.util.concurrent.StrucutredTaskScope.SubTask<String> user = scope.fork(() -> findUser());
java.util.concurrent.StrucutredTaskScope.SubTask<Integer> order = scope.fork(() -> fetchOrder());
scope.join(); // Join both forks
scope.throwIfFailed(); // ... and propagate errors
// user.state(); UNAVAILABLE, SUCCESS, FAILED
// user.exception();
// user.task(); Callable<? extends T>
// Here, both forks have succeeded, so compose their results
return new Response(user.get(), order.get());
}
}
90. Project Synergies
Project Synergies
Data more local than ever
Less reason to manually share data across thread pools
Same data no are private in per request model
GC when thread terminates
The Virtual thread stack objet itself is thread-local
102. Takeaways
Takeaways
Nothing is changed 😃
A virtual thread is a java.lang.Thread — in code, at runtime, in the debugger and in the profiler
Lighter threads
Pay-as-you-go stacks (size 200-300 bytes) stored in a heap
Scales to 1M+ on commodity hardware
Clean stack traces
Your old code just works
Readable sequential code
103. Takeaways
Takeaways
The natural unit of scheduling for operating systems
Clean stack traces
A virtual thread is not a wrapper around an OS thread, but a Java entity.
Creating a virtual thread is cheap — have millions, and don’t pool them!
Blocking a virtual thread is cheap — be synchronous!
No language changes are needed.
Pluggable schedulers offer the flexibility of asynchronous programming.
112. Virtual thread is being scheduled preemptively*** not cooperatively?
113. VT hang
VT hang
This means that the runtime makes the decision when to de-schedule (preempt) one thread and
schedule another to cooperation from user code
115. Map<Integer, Integer> map = new ConcurrentHashMap<>();
for (int i = 0; i < 1_000; i++) {
int finalI = i;
Thread.startVirtualThread(() ->
map.computeIfAbsent(finalI % 3, key -> {
try {
Thread.sleep(2_000);
} catch (InterruptedException e) {
throw new CancellationException("interrupted");
}
return finalI;
}));
}
long time = System.nanoTime();
try {
Th d t tVi t lTh d(() >S t t i tl ("Hi I' i t i t l th d"))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
116. — ConcurrentHashMap#computeIfAbsent
Some attempted update operations on this map by other threads may be blocked while
computation is in progress, so the computation should be short and simple, and must not
attempt to update any other mappings of this map.
119. private static final ConcurrentMap<String, String> cache = new ConcurrentHashMap<>();
private static String refresh(String key) {
try (var scope = new StructuredTaskScope.ShutdownOnSuccess<String>()) {
scope.fork(() -> UUID.randomUUID().toString());
scope.join();
return scope.result();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public static void main(String[] args) throws Exception {
final int cpus = Runtime.getRuntime().availableProcessors();
final List<Future> futureList = new ArrayList<>();
t ( E t Vi t lTh dP T kE t ()) {
120. It needs to allocate a lot of ReentrantLock instances, and those are 40 bytes per lock
130. Summary
Summary
Move to simpler blocking/synchronous code
Migrate tasks to Virtual threads not Platform threads to Virtual threads
Use Semaphores or similar to limit concurrency
Try to not cache expensive objects in Thread Locals
Avoid pinning
Avoid reusing
Avoid pooling