This document discusses parallel complex event processing using Esper. It provides an overview of Esper, an open source complex event processing (CEP) engine. Esper allows defining event processing logic through an event processing language (EPL) similar to SQL. It supports various data stream types, windowing techniques, filtering, context partitioning for parallel processing, and scaling Esper through context partitioning within a machine or across multiple machines.
5. Esper
●
Two editions:
―
Open source library
―
Enterprise server based on Jetty
●
Core component of Esper is a CEP engine.
●
CEP engine is working like database turned upside-down
●
Expressions are defined in Event Processing Language (EPL)
―
Declarative domain specific language
―
Similar with the SQL query language but differs from SQL in its use of views
rather than tables and events instead of records (rows)
―
Views are reused among EPL statements for efficiency!
select * from OrderEvent.win:length(5)
6. Streams
●
Complex event can be build based on several data streams.
select * from AlertEvent as a, NewsEvent as n
where a.symbol = n.symbol
●
Esper defines two types of data streams:
―
Filter-based event stream
select * from OrderEvent(itemType='shirt')
―
Pattern-based event stream
select * from pattern [
OrderEvent(itemType='shirt') -> OrderEvent(itemType='trousers')]
●
It is possible to join between filter-based and pattern-based streams!
●
Events can be forwarded to others streams using INSERT INTO keywords.
●
It is also possible to update event (using UPDATE keyword) before it applies
to any selecting statements
7. Views
●
●
●
Events are derived from streams (both filter- and
pattern-based) by views
Default view encloses all events from the stream
since addition of the statement to the engine.
View types:
–
Data windows (e.g. lenght, time)
–
Named windows
–
Extension Views (sorted window, rankied window,
time-order view)
–
Standard views (unique, grouped, size, lastevent)
–
Statistics view (univariate, regression, correlation)
8. Esper processing
●
●
●
Update listeners and subscriber objects are associated with EPL
statements
By defualt listeners and subscribers are notified when new event that
match EPL query arrive (insert stream)
In addition listeners and subscribers can be notified when some event
that match EPL query is removed from the stream (due to the limit of
particular window)
10. Filtering
Esper provides two types of filtering:
●
Stream-level filtering
select * from OrderEvent(type= 'shirt')
●
Post-data-window filtering
select * from OrderEvent where type = 'shirt'
13. Stream-level filtering vs post-data-window filtering
select * from OrderEvent(type= 'shirt')
vs
select * from OrderEvent where type = 'shirt'
The first form is preferred, but still sometimes post-data-window filtering is
desired:
Select one hundred orders and calculate average price of trousers.
select avg(price) from OrderEvent.win:length(100)
where type = 'trousers'
14. Data Windows
●
Basic windows:
―
―
Length batch window (win:length_batch)
―
Time window (win:time)
―
●
Length window (win:length)
Time batch window (win:time_batch)
Advanced time windows
―
Externally-timed window (win:ext_timed)
―
Externally-timed batch window (win:ext_timed_batch)
―
Time-Length combination batch window (win:time_length_batch)
―
Time-Accumulating window (win:time_accum)
―
Keep-All window (win:keepall)
―
First Length (win:firstlength)
―
First Time (win:firsttime)
17. Scaling Esper
●
●
According to the documentation Esper exceeds over 500 000 event/s on
a dual CPU 2GHz Intel based hardware, with engine latency below 3
microseconds average (below 10us with more than 99% predictability) on
a VWAP benchmark with 1000 statements registered in the system - this
tops at 70 Mbit/s at 85% CPU usage.
Parallel processing
–
Within one machine
-
–
Context partitions
With multiple machines
-
Partitioned stream
-
Partition by use case
19. Keyed Segmented Context
create context ByCustomerAndAccount
partition by custId and account from BankTxn
context ByCustomerAndAccount
select custId, account, sum(amount) from BankTxn
Implicite grouping in select statement.
20. Hash Segmented Context
Assigns events to context partitions based on result of a hash function and modulo
operation
create context SegmentedByCustomerHash coalesce by hash_code (custId) from
BankTxn granularity 16 preallocate
context SegmentedByCustomerHash
select custId, account, sum(amount) from BankTxn group by custId, account
No implicite grouping in select statement!
21. Category Segmented Context
Assigns events to context partitions based on the values of one or more event
properties, using a predicate expression(s) to define context partition membership.
create context CategoryByTemp
group temp < 65 as cold,
group temp between 65 and 85 as normal,
group temp > 85 as large
from SensorEvent
context CategoryByTemp
select context.label, count(*) from SensorEvent
22. Non-overlapping context
Non-overlapping context is created when start condition is meet and ended when end
condition is meet. There is always either one or zero context partions.
create context NineToFive start (0, 9, *, *, *) end (0, 17, *, *, *)
context NineToFive select * from TrafficEvent(speed >= 100)
23. Overlapping context
This context initiates a new context partition when an initiating condition occurs, and
terminates one or more context partitions when the terminating condition occurs.
create context CtxTrainEnter initiated by TrainEnterEvent as te
terminated after 5 minutes
context CtxTrainEnter select t1 from pattern [t1=TrainEnterEvent ->
timer:interval(5 min) and not TrainLeaveEvent(trainId =
context.te.trainId)]
24. Context nesting
In case of nested contextx the context declared first controls the
lifecycle of the context(s) declared thereafter.
create context NineToFiveSegmented
context NineToFive start (0, 9, *, *, *) end (0, 17, *, *, *),
context SegmentedByCustomer partition by custId from BankTxn
context NineToFiveSegmented
select custId, account, sum(amount) from BankTxn group by account
25. Partitioning without context declaration
Grouped data window std:groupwin()
What is the difference between:
select avg(price) from OrderEvent.std:groupwin(itemType).win:length(10)
And
select avg(price) from OrderEvent.win:length(10) group by itemType
?
CEP assumes multiple sources
Synonym of CEP is 'event correlation'.
Whereas a typical database stores data, and runs queries against the data, a CEP data stores queries, and runs data through the queries.
Language to specify expression-based event pattern matching
Does the 'insert into' is used to insert into other streams of events or only into named windows?
Data windows will be discussed later
Stream-level-filtering allows only simple filters
In case of post-data-window filtering update listener is not notified, but window is used (filled)
Post-data-window filtering allows more sophisticated filtering
Stream-level filetering has built in optimization. Sometimes post-data-window is neede
The granularity defines the maximum degree of parallelism