1 of 4

Parsers

Parsers are an important component of Fluent Bit, with them you can take any unstructured log entry and give them a structure that makes easier it processing and further filtering.

The parser engine is fully configurable and can process log entries based in two types of format:

JSON Maps
Regular Expressions (named capture)

By default, Fluent Bit provides a set of pre-configured parsers that can be used for different use cases such as logs from:

Apache
Nginx
Docker
Syslog rfc5424
Syslog rfc3164

Parsers are defined in one or multiple configuration files that are loaded at start time, either from the command line or through the main Fluent Bit configuration file.

Note: if you are using Regular Expressions note that Fluent Bit uses Ruby based regular expressions and we encourage to use Rubular web site as an online editor to test them.

Configuration Parameters

Multiple parsers can be defined and each section have it own properties. The following table describes the available options for each parser definition:

Parsers Configuration File

All parsers must be defined in a parsers.conf file, not in the Fluent Bit global configuration file. The parsers file expose all parsers available that can be used by the Input plugins that are aware of this feature. A parsers file can have multiple entries like this:

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On

[PARSER]
    Name        syslog-rfc5424
    Format      regex
    Regex       ^\<(?<pri>[0-9]{1,5})\>1 (?<time>[^ ]+) (?<host>[^ ]+) (?<ident>[^ ]+) (?<pid>[-0-9]+) (?<msgid>[^ ]+) (?<extradata>(\[(.*)\]|-)) (?<message>.+)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On
    Types pid:integer

For more information about the parsers available, please refer to the default parsers file distributed with Fluent Bit source code:

https://github.com/fluent/fluent-bit/blob/master/conf/parsers.conf

Time Resolution and Fractional Seconds

Some timestamps might have fractional seconds like 2017-05-17T15:44:31.187512963Z. Since Fluent Bit v0.12 we have full support for nanoseconds resolution, the %L format option for Time_Format is provided as a way to indicate that content must be interpreted as fractional seconds.

Note: The option %L is only valid when used after seconds (%S) or seconds since the Epoch (%s), e.g: %S.%L or %s.%L

JSON Parser

The JSON parser is the simplest option: if the original log source is a JSON map string, it will take it structure and convert it directly to the internal binary representation.

A simple configuration that can be found in the default parsers configuration file, is the entry to parse Docker log files (when the tail input plugin is used):

The following log entry is a valid content for the parser defined above:

After processing, it internal representation will be:

The time has been converted to Unix timestamp (UTC) and the map reduced to each component of the original message.

Regular Expression Parser

The regex parser allows us to define a custom Ruby Regular Expression that will use a named capture feature to define which content belongs to which key name.

Fluent Bit uses Onigmo regular expression library on Ruby mode, for testing purposes you can use the following web editor to test your expressions:

http://rubular.com/

Important: do not attempt to add multiline support in your regular expressions if you are using Tail input plugin since each line is handled as a separated entity. Instead use Tail Multiline support configuration feature.

Note: understanding how regular expressions works is out of the scope of this content.

From a configuration perspective, when the format is set to regex, is mandatory and expected that a Regex configuration key exists.

The following parser configuration example aims to provide rules that can be applied to an Apache HTTP Server log entry:

[PARSER]
    Name   apache
    Format regex
    Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

As an example, takes the following Apache HTTP Server log entry:

192.168.2.20 - - [29/Jul/2015:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395

The above content do not provide a defined structure for Fluent Bit, but enabling the proper parser we can help to make a structured representation of it:

[1154104030, {"host"=>"192.168.2.20",
              "user"=>"-",
              "method"=>"GET",
              "path"=>"/cgi-bin/try/",
              "code"=>"200",
              "size"=>"3395",
              "referer"=>"",
              "agent"=>""
              }
]

In order to understand, learn and test regular expressions like the example above, we suggest you try the following Ruby Regular Expression Editor: http://rubular.com/r/X7BH0M4Ivm

Decoders

There are certain cases where the log messages being parsed contains encoded data, a typical use case can be found in containerized environments with Docker: application logs it data in JSON format but becomes an escaped string, Consider the following example

Original message generated by the application:

{"status": "up and running"}

Then the Docker log message become encapsulated as follows:

{"log":"{\"status\": \"up and running\"}\r\n","stream":"stdout","time":"2018-03-09T01:01:44.851160855Z"}

as you can see the original message is handled as an escaped string. Ideally in Fluent Bit we would like to keep having the original structured message and not a string.

Getting Started

Decoders are a built-in feature available through the Parsers file, each Parser definition can optionally set one or multiple decoders. There are two type of decoders type:

Decode_Field: if the content can be decoded in a structured message, append that structure message (keys and values) to the original log message.
Decode_Field_As: any content decoded (unstructured or structured) will be replaced in the same key/value, no extra keys are added.

Our pre-defined Docker Parser have the following definition:

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command       |  Decoder  | Field | Optional Action   |
    # ==============|===========|=======|===================|
    Decode_Field_As    escaped     log

Each line in the parser with a key Decode_Field instruct the parser to apply a specific decoder on a given field, optionally it offer the option to take an extra action if the decoder cannot succeed.

Decoders

Optional Actions

By default if a decoder fails to decode the field or want to try a next decoder, is possible to define an optional action. Available actions are:

Note that actions are affected by some restrictions:

on Decode_Field_As, if succeeded, another decoder of the same type in the same field can be applied only if the data continue being a unstructed message (raw text).
on Decode_Field, if succeeded, can only be applied once for the same field. By nature Decode_Field aims to decode a structured message.

Examples

escaped_utf8

Example input (from /path/to/log.log in configuration below)

{"log":"\u0009Checking indexes...\n","stream":"stdout","time":"2018-02-19T23:25:29.1845444Z"}
{"log":"\u0009\u0009Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary\n","stream":"stdout","time":"2018-02-19T23:25:29.1845536Z"}
{"log":"\u0009Done\n","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

Example output

[24] tail.0: [1519082729.184544400, {"log"=>"   Checking indexes...                                                   
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845444Z"}]
[25] tail.0: [1519082729.184553600, {"log"=>"           Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845536Z"}]
[26] tail.0: [1519082729.184562200, {"log"=>"   Done                  
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845622Z"}]

Configuration file

[SERVICE]
    Parsers_File fluent-bit-parsers.conf

[INPUT]
    Name        tail
    Parser      docker
    Path        /path/to/log.log

[OUTPUT]
    Name   stdout
    Match  *

The fluent-bit-parsers.conf file,

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S %z
    Decode_Field_as escaped_utf8 log

Regular Expression Parser

The regex parser allows us to define a custom Ruby Regular Expression that will use a named capture feature to define which content belongs to which key name.

Fluent Bit uses Onigmo regular expression library on Ruby mode, for testing purposes you can use the following web editor to test your expressions:

http://rubular.com/

Note: understanding how regular expressions works is out of the scope of this content.

From a configuration perspective, when the format is set to regex, is mandatory and expected that a Regex configuration key exists.

The following parser configuration example aims to provide rules that can be applied to an Apache HTTP Server log entry:

[PARSER]
    Name   apache
    Format regex
    Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

As an example, takes the following Apache HTTP Server log entry:

192.168.2.20 - - [29/Jul/2015:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395

The above content do not provide a defined structure for Fluent Bit, but enabling the proper parser we can help to make a structured representation of it:

[1154104030, {"host"=>"192.168.2.20",
              "user"=>"-",
              "method"=>"GET",
              "path"=>"/cgi-bin/try/",
              "code"=>"200",
              "size"=>"3395",
              "referer"=>"",
              "agent"=>""
              }
]

In order to understand, learn and test regular expressions like the example above, we suggest you try the following Ruby Regular Expression Editor: http://rubular.com/r/X7BH0M4Ivm

Decoders

Original message generated by the application:

{"status": "up and running"}

Then the Docker log message become encapsulated as follows:

{"log":"{\"status\": \"up and running\"}\r\n","stream":"stdout","time":"2018-03-09T01:01:44.851160855Z"}

as you can see the original message is handled as an escaped string. Ideally in Fluent Bit we would like to keep having the original structured message and not a string.

Getting Started

Decoders are a built-in feature available through the Parsers file, each Parser definition can optionally set one or multiple decoders. There are two type of decoders type:

Decode_Field: if the content can be decoded in a structured message, append that structure message (keys and values) to the original log message.
Decode_Field_As: any content decoded (unstructured or structured) will be replaced in the same key/value, no extra keys are added.

Our pre-defined Docker Parser have the following definition:

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command       |  Decoder  | Field | Optional Action   |
    # ==============|===========|=======|===================|
    Decode_Field_As    escaped     log

Decoders

Optional Actions

By default if a decoder fails to decode the field or want to try a next decoder, is possible to define an optional action. Available actions are:

Note that actions are affected by some restrictions:

on Decode_Field_As, if succeeded, another decoder of the same type in the same field can be applied only if the data continue being a unstructed message (raw text).
on Decode_Field, if succeeded, can only be applied once for the same field. By nature Decode_Field aims to decode a structured message.

Examples

escaped_utf8

Example input (from /path/to/log.log in configuration below)

{"log":"\u0009Checking indexes...\n","stream":"stdout","time":"2018-02-19T23:25:29.1845444Z"}
{"log":"\u0009\u0009Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary\n","stream":"stdout","time":"2018-02-19T23:25:29.1845536Z"}
{"log":"\u0009Done\n","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

Example output

[24] tail.0: [1519082729.184544400, {"log"=>"   Checking indexes...                                                   
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845444Z"}]
[25] tail.0: [1519082729.184553600, {"log"=>"           Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845536Z"}]
[26] tail.0: [1519082729.184562200, {"log"=>"   Done                  
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845622Z"}]

Configuration file

[SERVICE]
    Parsers_File fluent-bit-parsers.conf

[INPUT]
    Name        tail
    Parser      docker
    Path        /path/to/log.log

[OUTPUT]
    Name   stdout
    Match  *

The fluent-bit-parsers.conf file,

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S %z
    Decode_Field_as escaped_utf8 log

Parsers

Parsers are an important component of Fluent Bit, with them you can take any unstructured log entry and give them a structure that makes easier it processing and further filtering.

The parser engine is fully configurable and can process log entries based in two types of format:

JSON Maps
Regular Expressions (named capture)

By default, Fluent Bit provides a set of pre-configured parsers that can be used for different use cases such as logs from:

Apache
Nginx
Docker
Syslog rfc5424
Syslog rfc3164

Parsers are defined in one or multiple configuration files that are loaded at start time, either from the command line or through the main Fluent Bit configuration file.

Note: if you are using Regular Expressions note that Fluent Bit uses Ruby based regular expressions and we encourage to use Rubular web site as an online editor to test them.

Configuration Parameters

Multiple parsers can be defined and each section have it own properties. The following table describes the available options for each parser definition:

Parsers Configuration File

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On

[PARSER]
    Name        syslog-rfc5424
    Format      regex
    Regex       ^\<(?<pri>[0-9]{1,5})\>1 (?<time>[^ ]+) (?<host>[^ ]+) (?<ident>[^ ]+) (?<pid>[-0-9]+) (?<msgid>[^ ]+) (?<extradata>(\[(.*)\]|-)) (?<message>.+)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On
    Types pid:integer

For more information about the parsers available, please refer to the default parsers file distributed with Fluent Bit source code:

https://github.com/fluent/fluent-bit/blob/master/conf/parsers.conf

Time Resolution and Fractional Seconds

Note: The option %L is only valid when used after seconds (%S) or seconds since the Epoch (%s), e.g: %S.%L or %s.%L