This presentation about a ModSecurity logfile parser was introduced at OWASP CRS community Summit on 14th of February, 2023.
The tool is available here: https://github.com/digitalwave/libmsclogparser
2. Content
● idea
● goals and motivations
● internal operation
● examples
● future plans
// 2
3. Idea
● we use ModSecurity on more and more servers, both Apache and Nginx
○ more very unpleasant FP's
● demand: process logs for a custom dashboard
○ both developers and admins can check the WAF results
● collect virtual hosts logs from the servers
○ side note: discover the servers config with httpd_pyparser
○ collect information about server configuration, incl. log paths, CRS setup (PL), ModSecurity customization
● which log?
○ audit.log?
○ error.log? - was chosen this one: we need "only 'Warning'" lines too
● how to transport the log(s) to the dashboard app?
○ on the fly - through pipe (Apache) or through syslog (Nginx)
○ copy the logs and load them in place
// 3
4. First steps
● facing the problems:
○ log parts and those limitations - see later
○ truncated fields
○ falsified data - see later
○ differences between the engines
○ portability - what if we want several different application?
○ performance - only may occur in extreme cases
● it's not as trivial as it seems
// 4
5. Falsify data
Example: how to fill the log with false data
curl -v -X POST -d ") [file "/dev/null"] [line "-2"] [id "3.1415"] [msg "Empty"] [data
"[file "/dev/random"] [line "inf"]"] [severity "NORMAL"] =@attack"...
will produce this funny line - a game: find the correct "file" field!
ModSecurity: Warning. Matched phrase "pattern_from_attack" at ARGS_NAMES:) [file "/dev/null"]
[line "-2"] [id "3.1415"] [msg "Empty"] [data "[file "/dev/random"] [line "inf"]"] [severity
"NORMAL"] . [file "/usr..
Fortunately, later this won't appear (eg. %{MATCHED_VAR} in [data]), because that fields are encoded.
// 5
6. Goals and motivations
● parse the lines of the logfiles into a structure
○ structure contains as much details as possible
○ help with row filtering or making exclusions
■ eg. add access to site developer who can see the affected logs and mark the FP's
○ indicate if the line is wrong
● doing it as fast as possible
● can be moved to other platforms (eg. from Python to PHP)
● with minimal dependency (decrease the number of 3rd party libraries)
// 6
7. Internal operation
● libmsclogparser is written in C
○ therefore it can be used for other applications what also was written in C, C++, Go or Rust
○ by default it has these bindings: Lua, PHP, Python, Ruby
○ available on Github: https://github.com/digitalwave/libmsclogparser, with license AGPL
● no other libraries needed (eg. regex, glib)
● used standard string functions
○ compiler can optimize these functions
● main function is the "parse()"
○ expects a line as string, length of the string, type of line (Apache or Nginx) and an empty structure
○ in bindings these are easier: needs only the line and the type of the line
○ the function fills the empty structure, bindings returns the structure
● a helper function: read_msclog_err() - gives a list about error messages and positions
// 7
8. Internal operation
It was important to understand how a log entry is created in case of both engines.
Let's see how do they work!
● logs are written through the web server
● the length of the log (and thus its content) is limited
● the code writer decided which part could be truncated
// 8
9. Internal operation - parts of a log line
Core parts (with bold)
[Tue Feb 14 09:00:00.123456 2023] [security2:error] [pid 364323:tid 139847182132992] [client
216.244.66.246:57996] [client 216.244.66.246] ModSecurity:...
2023/02/14 09:00:00 [info] 1419350#1419350: *12986 ModSecurity:... , client: 162.214.112.108,
server: my.virtualserver.com, request: "GET /.env HTTP/1.1", host: "my.virtualserver.com"
● these parts are always present, each line contains them
○ therefore these are not truncated, always present with fix length
● generated by the core logger - not the module!
● the rest generated by the modul
// 9
10. Internal operation - parts of a log line
Module parts (with bold) - Apache
[client 216.244.66.246:57996] [client 216.244.66.246] ModSecurity: Warning. Operator GE matched 5
at TX:inbound_anomaly_score. [file "/…/RESPONSE-980-CORRELATION.conf"] [line "92"] [id "980130"]
[msg "Inbound Anomaly Score Exceeded (Total Inbound Score: 5 - SQLI=0,...,SESS=0): individual
paranoia level scores: 5, 0, 0, 0"] [ver "OWASP_CRS/3.3.4"] [tag "event-correlation"] [hostname
"my.virtualserver.com"] [uri "/robots.txt"] [unique_id "Y-PIh32oaSsbx0Ag_pH6agAAAEE"]
● second (bold) [client] is duplicated (without port)
● part with red is the "message". Without string "Warning. ", the max length can be 252 + " …"
● part with black is the metadata - see later
● part with brown is appended at the end of process, those can't be truncated
// 10
11. Internal operation - parts of a log line
Metadata in module parts - Apache
[file "/…/RESPONSE-980-CORRELATION.conf"] [line "92"] [id "980130"] [msg "Inbound Anomaly Score
Exceeded (Total Inbound Score: 5 - SQLI=0,...,SESS=0): individual paranoia level scores: 5, 0, 0,
0"] [ver "OWASP_CRS/3.3.4"] [tag "event-correlation"]
● strict order: file, line, id, rev, msg, data, severity, ver, maturity, accuracy, tag
● fields are optional - if rule does not have value with that field, it won't appear
● [data] value can be max length of 512; if it longer, then truncated and appended the '…"]' tail
● [tag] can be there many times
// 11
12. Internal operation - parts of a log line
Tail in module parts - Apache
[hostname "my.virtualserver.com"] [uri "/robots.txt"] [unique_id "Y-PIh32oaSsbx0Ag_pH6agAAAEE"]
These fields are always presented in the log file as they are, their length does not matter.
// 12
13. Internal operation - parts of a log line
Concatenate module parts - Apache
● core parts come from the server
● leading text: "[client a.b.c.d] ModSecurity: Warning." always presents
● added the message: "Operator GE matched…"
○ there can be many kind of message!
● added the metadata
● maximum length of leading text, message and metadata can be any lenght up to 1024 bytes
● after that, the tail part is added
● effect: there can be a truncated field, usually after the data
○ eg:
[data "some long text …"] [severity "CRITICAL"] [v [hostname "my.virtualhost.com]
[ver "OWASP_CRS/3.3.2"] [tag "application-multi"] [tag "lang [hostname
"my.virtualhost.com"]
// 13
14. Internal operation - parts of a log line
Module parts (with bold) - Nginx
12986 ModSecurity: Warning. Matched "Operator `PmFromFile' with parameter `restricted-files.data'
against variable `REQUEST_FILENAME' (Value: `/.env' ) [file "/../REQUEST-930-APPLICATION-ATTACK-
LFI.conf"] [line "106"] [id "930130"] [rev ""] [msg "Restricted File Access Attempt"] [data
"Matched Data: /.env found within REQUEST_FILENAME: /.env"] [severity "2"] [ver "OWASP_CRS/3.3.4"]
[maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-
multi"] [tag "attack-lfi"] [tag "paranoia-level/1"] [tag "OWASP_CRS"] [tag
"capec/1000/255/153/126"] [tag "PCI/6.5.4"] [hostname "my.virtualserver.com"] [uri "/.env"]
[unique_id "167466178181.669649"] [ref "..."], client: …
● part with red is the "message" with a very strict content
● part with black is the metadata - see later
● part with brown is appended at the end of process, those can't be truncated
// 14
15. Internal operation - parts of a log line
Metadata in module parts - Nginx
[file "/../REQUEST-930-APPLICATION-ATTACK-LFI.conf"] [line "106"] [id "930130"] [rev ""] [msg
"Restricted File Access Attempt"] [data "Matched Data: /.env found within REQUEST_FILENAME:
/.env"] [severity "2"] [ver "OWASP_CRS/3.3.4"] [maturity "0"] [accuracy "0"] [tag "application-
multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-lfi"] [tag "paranoia-level/1"]
[tag "OWASP_CRS"] [tag "capec/1000/255/153/126"] [tag "PCI/6.5.4"]
● strict order: file, line, id, rev, msg, data, severity, ver, maturity, accuracy, tag
● fields are not optional (except [tag]) - if data does not exist, it will be there with empty or 0
● [data] value can be max length of 200; if it longer, then truncated and appended the 'N characters
omitted)' tail
● [tag] can be there many times
// 15
16. Internal operation - parts of a log line
Tail in module parts - Nginx
[hostname "my.virtualserver.com"] [uri "/.env"] [unique_id "167466178181.669649"] [ref "..."]
These fields are always presented in the log file as is; [uri] has a limit with 200 characters, others do not
matter how long they are.
// 16
17. Internal operation - parts of a log line
Concatenate module parts - Nginx
● core parts comes from the server
● leading text: "ModSecurity: Warning." always presents
○ only two types of messages, the other is the "Access denied"
● added the message: "Operator `OP' with parameter `PARAM' against variable `KEY' (Value:
`VALUE' )"
● PARAM limited in 200, VALUE limited in 100 characters
● added the metadata
● maximum length of leading text, message and metadata can be any length up to 2048 bytes
● after that, the tail part added
● no random truncated field, all truncated parts are marked explicitly
// 17
18. Internal operation - splitting the line
● challenge: split the line into the parts above
● find the left border of message, it is easy: first "ModSecurity:" occurrence it is
● find the right border: remember the fails data - look for the last ' [file "' pattern in the line
○ a space, an opening brace, word 'file', space, quotation mark
○ later (eg. in [data]) this pattern is in hexa-encoded form: ' [file 22'
● tail starts with fix ' [hostname "'...
● parsing metadata part: find the next possible field (strict order!)
● trick: the right border of next field is the left border of current (step back 2 chars)
● but allow the search method to find the shortest possible pattern, eg: ' [v' ( [ver), ' [t' (tag) and so on
● due to the strict order, they exclude each other from occurring, ' [v' is the shortened form of ' [ver "'
○ only one conflict can be there with ' [m': [msg] and [maturity] - but [msg] is before [data], so it won't be
truncated ever
// 18
19. Internal operation - getting more information
Find the type of message!
● "ModSecurity: Warning. ", "ModSecurity: Access Denied. ", "ModSecurity: Rule error", …
○ these are important information
○ some of them modifies the line structure:
■ …message… [id "-"][file "/…"][line "345"] Execution error - PCRE limits exceeded (-8):
(null)
■ parser is able to recognize these lines too!
● Get the reason ("Pattern match"), operand and target - making exclusions easier
○ in case of Nginx, more information are available: operator, operand, target name, target value
● Show the truncated field - if any
● Mark all errors: truncated field, missing fields (eg. disk was full and line truncated)
// 19
20. Example
Using parser in a Python script:
● open the file given as argument
● set the type (can be an argument too)
● parse the lines
// 20
lt = macpylogparse.LOG_TYPE_APACHE
with open(sys.argv[1], "r") as fp:
lines = fp.readlines()
for l in lines:
r = mscpylogparser.parse(l, len(l), lt)
print(json.dumps(r))
24. Future plans
● adding more pattern recognition from message
● getting more details from existing types
○ eg. 'Operator GE matched 5 at TX:inbound_anomaly_score.'
■ 'GE' is the operator
■ 5 is the operand
● distinguish between unset and empty values
○ [ver ""] produces "" value instead of None/NULL
● more bindings (if necessary)
○ Perl
// 24