As a follow-up of the previous session about TFB, we will discuss what kind of tuning was made to the mORMot library, and its associated TFB sample implementation, to reach the top scores in charts. How can a pure Pascal project reach 7 millions of HTTP requests per seconds? How to scale and measure on high-end hardware? Are ORM frameworks damned to slow down everything? How to circumvent the lack of “async” programming at language level? How realistic is such a benchmark?
7. Languages & Frameworks
Performance Bottlenecks
• The Web Server
• The Database layer
• The threading/execution Model
• The RTL (JSON, heap…)
Disclaimer:
slide borrowed from yesterday’s session
8. Menu du jour
• The TFB Challenge
• Async Web Server
• Async Database Access
• JSON, RTTI, Mustache
11. The TFB Challenge
Web Frameworks Benchmarks
• Since 2013, a collaborative project
• Now one official round per year
• Hundredths of frameworks tested
• Seven web /endpoints tested
• 24/7 continuous tests on dedicated HW
12. The TFB Challenge
One official round per year
• Round 22 is just finished
• mORMot is #12 over 301 frameworks !
(with current weighting)
18. The TFB Challenge
endpoint rps% vs #1
/plaintext 99.4%
/json 91.8%
/db single SELECT potential 59.7%
/query?queries=### 89.7%
/cached_queries?count=### 99.8%
/update?queries=### 81.5%
/fortunes 59.5%
19. The TFB Challenge
24/7 continuous tests on big HW
Latest runs are available at
https://tfb-status.techempower.com
20. The TFB Challenge
24/7 continuous tests on big HW
• Several strategies to fill all CPU cores
pin to CPU, several bound servers, threads
• Several strategies for database access
ORM, direct, async
• A lot of test & trials
guessing is usually wrong
no definitive/logical rules
21. The TFB Challenge
Demanding but not realistic
• Measured with wrk over a single endpoint
• Such perfect/optimal network does not exist
• Very small dataset
• No long-running tests
• No memory consumption
23. Async Web Server
Several Web Servers
• THttpServer
1 thread per HTTP/1.1 client
thread pool for HTTP/1.0 requests (proxy)
• THttpApiServer
Windows-specific – using http.sys API
• THttpAsyncServer
thread pool for all requests
24. Async Web Server
Several WebSockets Servers
• TWebSocketServerRest
1 thread per WebSockets client
• THttpApiWebSocketServer
Windows-specific – using http.sys API
• TWebSocketAsyncServerRest
thread pool for all requests
25. Async Web Server
THttpAsyncServer
• Non-blocking read/write state machine
Using epoll API on Linux
• Based on abstract THttpAsyncConnections
HTTP-over-TCP per-connection architecture
• Optional TLS layer
with OpenSSL or Windows SSPI
and ACME/Let’s Encrypt built-in support
26. Async Web Server
THttpAsyncServer
• Can sustain thousands of concurrent clients
• With minimal memory/cpu consumption
• Exist as TWebSocketAsyncServerRest flavor
• THttpServer may be more reactive
for a few connections
27. Async Web Server
THttpAsyncServer
• Responses are computed in a thread pool
with regular blocking end-user code
• Optionally returned out-of-order
e.g. for asynchronous DB operations
• Only wake up threads if needed
to avoid syscalls on small queries
28. Async Web Server
THttpAsyncServer
• Avoid memory allocation
e.g. HTTP buffers reuse between connections
optional string interning of HTTP headers
• Non-blocking process
smallest possible granularity locks
29. Async Web Server
THttpAsyncServer
• Minimized syscalls
profiled with strace on Linux
libc vDSO for clock_gettime()
our own light locks with no mutex
wakeup threads using eventfd() – only if needed
per-second cache of date/time or timeout ticks
31. Async Web Server
THttpAsyncServer
• Very efficient URI routing
with O(1) parsing of routes and parameters
• Can use the RTTI over methods
to define the endpoints
• Can redirect/rewrite URI before a REST layer
32. Async Web Server
THttpAsyncServer
• Perfect for the TFB use-case
top of /plaintext or /cached_queries
• Has been tuned to increase TFB numbers
also noticeable on other HW or OS
33. Async Web Server
mORMot 2 TFB Sample
https://github.com/synopse/mORMot2/tree/
master/ex/techempower-bench
36. Async Database Access
mormot.db.sql.postgres.pas
• Direct libpq client access
• Written from scratch
• Using mormot.db.sql.pas simplified design
cached statements with no TDataSet overhead
• Optional array binding
• Optional pipelined mode support
• Optional asynchronous / non-blocking API
39. JSON, RTTI, Mustache
JSON
• mORMot is natively UTF-8 and JSON
on all platforms and compilers (even Delphi 7)
• mORMot 2 JSON core has been rewritten
for performance
43. JSON, RTTI, Mustache
RTTI
• mORMot has its own RTTI cache
on all platforms and compilers
• mORMot 2 RTTI core has been rewritten
for performance and maintainability
(mORMot 1 did have duplicated logic)
44. JSON, RTTI, Mustache
Mustache
• mORMot has its own Mustache renderer
• mORMot 2 Mustache
can work directly on in-memory data
instead of TDocVariant containers