The JSON parser is the simplest option: if the original log source is a JSON map string, it will take it structure and convert it directly to the internal binary representation.
A simple configuration that can be found in the default parsers configuration file, is the entry to parse Docker log files (when the tail input plugin is used):
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S %zThe following log entry is a valid content for the parser defined above:
{"key1": 12345, "key2": "abc", "time": "2006-07-28T13:22:04Z"}After processing, it internal representation will be:
[1154103724, {"key1"=>12345, "key2"=>"abc"}]The time has been converted to Unix timestamp (UTC) and the map reduced to each component of the original message.
The ltsv parser allows to parse LTSV formatted texts.
Labeled Tab-separated Values (LTSV format is a variant of Tab-separated Values (TSV). Each record in a LTSV file is represented as a single line. Each field is separated by TAB and has a label and a value. The label and the value have been separated by ':'.
Here is an example how to use this format in the apache access log.
Config this in httpd.conf:
LogFormat "host:%h\tident:%l\tuser:%u\ttime:%t\treq:%r\tstatus:%>s\tsize:%b\treferer:%{Referer}i\tua:%{User-Agent}i" combined_ltsv
CustomLog "logs/access_log" combined_ltsvThe parser.conf:
[PARSER]
Name access_log_ltsv
Format ltsv
Time_Key time
Time_Format [%d/%b/%Y:%H:%M:%S %z]
Types status:integer size:integerThe following log entry is a valid content for the parser defined above:
host:127.0.0.1 ident:- user:- time:[10/Jul/2018:13:27:05 +0200] req:GET / HTTP/1.1 status:200 size:16218 referer:http://127.0.0.1/ ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
host:127.0.0.1 ident:- user:- time:[10/Jul/2018:13:27:05 +0200] req:GET /assets/plugins/bootstrap/css/bootstrap.min.css HTTP/1.1 status:200 size:121200 referer:http://127.0.0.1/ ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
host:127.0.0.1 ident:- user:- time:[10/Jul/2018:13:27:05 +0200] req:GET /assets/css/headers/header-v6.css HTTP/1.1 status:200 size:37706 referer:http://127.0.0.1/ ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
host:127.0.0.1 ident:- user:- time:[10/Jul/2018:13:27:05 +0200] req:GET /assets/css/style.css HTTP/1.1 status:200 size:1279 referer:http://127.0.0.1/ ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0After processing, it internal representation will be:
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET / HTTP/1.1", "status"=>200, "size"=>16218, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET /assets/plugins/bootstrap/css/bootstrap.min.css HTTP/1.1", "status"=>200, "size"=>121200, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET /assets/css/headers/header-v6.css HTTP/1.1", "status"=>200, "size"=>37706, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET /assets/css/style.css HTTP/1.1", "status"=>200, "size"=>1279, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]The time has been converted to Unix timestamp (UTC).
There are certain cases where the log messages being parsed contains encoded data, a typical use case can be found in containerized environments with Docker: application logs it data in JSON format but becomes an escaped string, Consider the following example
Original message generated by the application:
{"status": "up and running"}Then the Docker log message become encapsulated as follows:
{"log":"{\"status\": \"up and running\"}\r\n","stream":"stdout","time":"2018-03-09T01:01:44.851160855Z"}as you can see the original message is handled as an escaped string. Ideally in Fluent Bit we would like to keep having the original structured message and not a string.
Decoders are a built-in feature available through the Parsers file, each Parser definition can optionally set one or multiple decoders. There are two type of decoders type:
Decode_Field: if the content can be decoded in a structured message, append that structure message (keys and values) to the original log message.
Decode_Field_As: any content decoded (unstructured or structured) will be replaced in the same key/value, no extra keys are added.
Our pre-defined Docker Parser have the following definition:
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action |
# ==============|===========|=======|===================|
Decode_Field_As escaped logEach line in the parser with a key Decode_Field instruct the parser to apply a specific decoder on a given field, optionally it offer the option to take an extra action if the decoder cannot succeed.
Name
Description
json
handle the field content as a JSON map. If it find a JSON map it will replace the content with a structured map.
escaped
decode an escaped string.
escaped_utf8
decode a UTF8 escaped string.
By default if a decoder fails to decode the field or want to try a next decoder, is possible to define an optional action. Available actions are:
Name
Description
try_next
if the decoder failed, apply the next Decoder in the list for the same field.
do_next
if the decoder succeeded or failed, apply the next Decoder in the list for the same field.
Note that actions are affected by some restrictions:
on Decode_Field_As, if succeeded, another decoder of the same type in the same field can be applied only if the data continue being a unstructed message (raw text).
on Decode_Field, if succeeded, can only be applied once for the same field. By nature Decode_Field aims to decode a structured message.
Example input (from /path/to/log.log in configuration below)
{"log":"\u0009Checking indexes...\n","stream":"stdout","time":"2018-02-19T23:25:29.1845444Z"}
{"log":"\u0009\u0009Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary\n","stream":"stdout","time":"2018-02-19T23:25:29.1845536Z"}
{"log":"\u0009Done\n","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}Example output
[24] tail.0: [1519082729.184544400, {"log"=>" Checking indexes...
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845444Z"}]
[25] tail.0: [1519082729.184553600, {"log"=>" Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845536Z"}]
[26] tail.0: [1519082729.184562200, {"log"=>" Done
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845622Z"}]Configuration file
[SERVICE]
Parsers_File fluent-bit-parsers.conf
[INPUT]
Name tail
Parser docker
Path /path/to/log.log
[OUTPUT]
Name stdout
Match *The fluent-bit-parsers.conf file,
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S %z
Decode_Field_as escaped_utf8 logThe regex parser allows to define a custom Ruby Regular Expression that will use a named capture feature to define which content belongs to which key name.
Fluent Bit uses regular expression library on Ruby mode, for testing purposes you can use the following web editor to test your expressions:
Important: do not attempt to add multiline support in your regular expressions if you are using input plugin since each line is handled as a separated entity. Instead use Tail support configuration feature.
Security Warning: Onigmo is a backtracking regex engine. You need to be careful not to use expensive regex patterns, or Onigmo can take very long time to perform pattern matching. For details, please read the article on OWASP.
Note: understanding how regular expressions works is out of the scope of this content.
From a configuration perspective, when the format is set to regex, is mandatory and expected that a Regex configuration key exists.
The following parser configuration example aims to provide rules that can be applied to an Apache HTTP Server log entry:
As an example, takes the following Apache HTTP Server log entry:
The above content do not provide a defined structure for Fluent Bit, but enabling the proper parser we can help to make a structured representation of it:
A common pitfall is that you cannot use characters other than alphabets, numbers and underscore in group names. For example, a group name like (?<user-name>.*) will cause an error due to containing an invalid character (-).
In order to understand, learn and test regular expressions like the example above, we suggest you try the following Ruby Regular Expression Editor:
The logfmt parser allows to parse the logfmt format described in . A more formal description is in .
Here is an example configuration:
The following log entry is a valid content for the parser defined above:
After processing, it internal representation will be:
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z192.168.2.20 - - [29/Jul/2015:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395[1154104030, {"host"=>"192.168.2.20",
"user"=>"-",
"method"=>"GET",
"path"=>"/cgi-bin/try/",
"code"=>"200",
"size"=>"3395",
"referer"=>"",
"agent"=>""
}
][PARSER]
Name logfmt
Format logfmtkey1=val1 key2=val2[1540936693, {"key1"=>"val1",
"key2"=>"val2"}]