1 of 7

Parsers

Configuring Parser

Parsers are an important component of Fluent Bit, with them you can take any unstructured log entry and give them a structure that makes easier it processing and further filtering.

The parser engine is fully configurable and can process log entries based in two types of format:

JSON Maps
Regular Expressions (named capture)

By default, Fluent Bit provides a set of pre-configured parsers that can be used for different use cases such as logs from:

Apache
Nginx
Docker
Syslog rfc5424
Syslog rfc3164

Parsers are defined in one or multiple configuration files that are loaded at start time, either from the command line or through the main Fluent Bit configuration file.

Note: If you are using Regular Expressions note that Fluent Bit uses Ruby based regular expressions and we encourage to use Rubular web site as an online editor to test them.

Configuration Parameters

Multiple parsers can be defined and each section has it own properties. The following table describes the available options for each parser definition:

Key

Description

Parsers Configuration File

All parsers must be defined in a parsers.conf file, not in the Fluent Bit global configuration file. The parsers file expose all parsers available that can be used by the Input plugins that are aware of this feature. A parsers file can have multiple entries like this:

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On

[PARSER]
    Name        syslog-rfc5424
    Format      regex
    Regex       ^\<(?<pri>[0-9]{1,5})\>1 (?<time>[^ ]+) (?<host>[^ ]+) (?<ident>[^ ]+) (?<pid>[-0-9]+) (?<msgid>[^ ]+) (?<extradata>(\[(.*)\]|-)) (?<message>.+)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On
    Types pid:integer

For more information about the parsers available, please refer to the default parsers file distributed with Fluent Bit source code:

https://github.com/fluent/fluent-bit/blob/master/conf/parsers.conf

Time Resolution and Fractional Seconds

Time resolution and its format supported are handled by using the strftime(3) libc system function.

In addition, we extended our time resolution to support fractional seconds like 2017-05-17T15:44:31**.187512963**Z. Since Fluent Bit v0.12 we have full support for nanoseconds resolution, the %L format option for Time_Format is provided as a way to indicate that content must be interpreted as fractional seconds.

Note: The option %L is only valid when used after seconds (%S) or seconds since the Epoch (%s), e.g: %S.%L or %s.%L

JSON

The JSON parser is the simplest option: if the original log source is a JSON map string, it will take its structure and convert it directly to the internal binary representation.

A simple configuration that can be found in the default parsers configuration file, is the entry to parse Docker log files (when the tail input plugin is used):

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S %z

The following log entry is a valid content for the parser defined above:

{"key1": 12345, "key2": "abc", "time": "2006-07-28T13:22:04Z"}

After processing, its internal representation will be:

[1154103724, {"key1"=>12345, "key2"=>"abc"}]

The time has been converted to Unix timestamp (UTC) and the map reduced to each component of the original message.

Regular Expression

The Regex parser lets you define a custom Ruby regular expression that uses a named capture feature to define which content belongs to which key name.

Use Tail Multiline when you need to support regexes across multiple lines from a tail. The Tail input plugin treats each line as a separate entity.

Security Warning: Onigmo is a backtracking regex engine. When using expensive regex patterns Onigmo can take a long time to perform pattern matching. Read "ReDoS" on OWASP for additional information.

Setting the format to regex requires a regex configuration key.

Configuration Parameters

The regex parser supports the following configuration parameters:

Key

Description

Default Value

Fluent Bit uses the Onigmo regular expression library on Ruby mode.

You can use only alphanumeric characters and underscore in group names. For example, a group name like (?<user-name>.*) causes an error due to the invalid dash (-) character. Use the Rubular web editor to test your expressions.

The following parser configuration example provides rules that can be applied to an Apache HTTP Server log entry:

[PARSER]
    Name   apache
    Format regex
    Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z
    Types code:integer size:integer

As an example, review the following Apache HTTP Server log entry:

192.168.2.20 - - [29/Jul/2015:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395

This log entry doesn't provide a defined structure for Fluent Bit. Enabling the proper parser can help to make a structured representation of the entry:

[1154104030, {"host"=>"192.168.2.20",
              "user"=>"-",
              "method"=>"GET",
              "path"=>"/cgi-bin/try/",
              "code"=>"200",
              "size"=>"3395",
              "referer"=>"",
              "agent"=>""
              }
]

LTSV

The ltsv parser allows to parse LTSV formatted texts.

Labeled Tab-separated Values (LTSV format is a variant of Tab-separated Values (TSV). Each record in a LTSV file is represented as a single line. Each field is separated by TAB and has a label and a value. The label and the value have been separated by ':'.

Here is an example how to use this format in the apache access log.

Config this in httpd.conf:

LogFormat "host:%h\tident:%l\tuser:%u\ttime:%t\treq:%r\tstatus:%>s\tsize:%b\treferer:%{Referer}i\tua:%{User-Agent}i" combined_ltsv
CustomLog "logs/access_log" combined_ltsv

The parser.conf:

[PARSER]
    Name        access_log_ltsv
    Format      ltsv
    Time_Key    time
    Time_Format [%d/%b/%Y:%H:%M:%S %z]
    Types       status:integer size:integer

The following log entry is a valid content for the parser defined above:

host:127.0.0.1  ident:- user:-  time:[10/Jul/2018:13:27:05 +0200]       req:GET / HTTP/1.1      status:200      size:16218      referer:http://127.0.0.1/       ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
host:127.0.0.1  ident:- user:-  time:[10/Jul/2018:13:27:05 +0200]       req:GET /assets/plugins/bootstrap/css/bootstrap.min.css HTTP/1.1        status:200      size:121200     referer:http://127.0.0.1/       ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
host:127.0.0.1  ident:- user:-  time:[10/Jul/2018:13:27:05 +0200]       req:GET /assets/css/headers/header-v6.css HTTP/1.1      status:200      size:37706      referer:http://127.0.0.1/       ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
host:127.0.0.1  ident:- user:-  time:[10/Jul/2018:13:27:05 +0200]       req:GET /assets/css/style.css HTTP/1.1  status:200      size:1279       referer:http://127.0.0.1/       ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0

After processing, it internal representation will be:

[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET / HTTP/1.1", "status"=>200, "size"=>16218, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET /assets/plugins/bootstrap/css/bootstrap.min.css HTTP/1.1", "status"=>200, "size"=>121200, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET /assets/css/headers/header-v6.css HTTP/1.1", "status"=>200, "size"=>37706, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET /assets/css/style.css HTTP/1.1", "status"=>200, "size"=>1279, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]

The time has been converted to Unix timestamp (UTC).

Logfmt

The logfmt parser allows to parse the logfmt format described in https://brandur.org/logfmt . A more formal description is in https://godoc.org/github.com/kr/logfmt .

Here is an example configuration:

[PARSER]
    Name        logfmt
    Format      logfmt

The following log entry is a valid content for the parser defined above:

key1=val1 key2=val2 key3

After processing, it internal representation will be:

[1540936693, {"key1"=>"val1",
              "key2"=>"val2"
              "key3"=>true}]

If you want to be more strict than the logfmt standard and not parse lines where some attributes do not have values (such as key3) in the example above, you can configure the parser as follows:

[PARSER]
    Name        logfmt
    Format      logfmt
    Logfmt_No_Bare_Keys true

Decoders

There are cases where the log messages being parsed contain encoded data. A typical use case can be found in containerized environments with Docker. Docker logs its data in JSON format, which uses escaped strings.

Consider the following message generated by the application:

{"status": "up and running"}

The Docker log message encapsulates something like this:

{"log":"{\"status\": \"up and running\"}\r\n","stream":"stdout","time":"2018-03-09T01:01:44.851160855Z"}

The original message is handled as an escaped string. Fluent Bit wants to use the original structured message and not a string.

Getting Started

Decoders are a built-in feature available through the Parsers file. Each parser definition can optionally set one or more decoders. There are two types of decoders:

Decode_Field: If the content can be decoded in a structured message, append the structured message (keys and values) to the original log message.
Decode_Field_As: Any decoded content (unstructured or structured) will be replaced in the same key/value, and no extra keys are added.

Our pre-defined Docker parser has the following definition:

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command       |  Decoder  | Field | Optional Action   |
    # ==============|===========|=======|===================|
    Decode_Field_As    escaped     log

Each line in the parser with a key Decode_Field instructs the parser to apply a specific decoder on a given field. Optionally, it offers the option to take an extra action if the decoder doesn't succeed.

Decoder options

Optional Actions

If a decoder fails to decode the field or, you want to try another decoder, you can define an optional action. Available actions are:

Actions are affected by some restrictions:

Decode_Field_As: If successful, another decoder of the same type and the same field can be applied only if the data continues being an unstructured message (raw text).
Decode_Field: If successful, can only be applied once for the same field. Decode_Field` is intended to decode a structured message.

Examples

`escaped_utf8`

Example input from /path/to/log.log:

{"log":"\u0009Checking indexes...\n","stream":"stdout","time":"2018-02-19T23:25:29.1845444Z"}
{"log":"\u0009\u0009Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary\n","stream":"stdout","time":"2018-02-19T23:25:29.1845536Z"}
{"log":"\u0009Done\n","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

Example output:

[24] tail.0: [1519082729.184544400, {"log"=>"   Checking indexes...
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845444Z"}]
[25] tail.0: [1519082729.184553600, {"log"=>"           Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845536Z"}]
[26] tail.0: [1519082729.184562200, {"log"=>"   Done
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845622Z"}]

Decoder configuration file:

[SERVICE]
    Parsers_File fluent-bit-parsers.conf

[INPUT]
    Name        tail
    Parser      docker
    Path        /path/to/log.log

[OUTPUT]
    Name   stdout
    Match  *

The fluent-bit-parsers.conf file:

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S %z
    Decode_Field_as escaped_utf8 log

Decoders

Consider the following message generated by the application:

{"status": "up and running"}

The Docker log message encapsulates something like this:

{"log":"{\"status\": \"up and running\"}\r\n","stream":"stdout","time":"2018-03-09T01:01:44.851160855Z"}

The original message is handled as an escaped string. Fluent Bit wants to use the original structured message and not a string.

Getting Started

Decoders are a built-in feature available through the Parsers file. Each parser definition can optionally set one or more decoders. There are two types of decoders:

Decode_Field: If the content can be decoded in a structured message, append the structured message (keys and values) to the original log message.
Decode_Field_As: Any decoded content (unstructured or structured) will be replaced in the same key/value, and no extra keys are added.

Our pre-defined Docker parser has the following definition:

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command       |  Decoder  | Field | Optional Action   |
    # ==============|===========|=======|===================|
    Decode_Field_As    escaped     log

Decoder options

Name

Description

Optional Actions

If a decoder fails to decode the field or, you want to try another decoder, you can define an optional action. Available actions are:

Name

Description

Actions are affected by some restrictions:

Decode_Field_As: If successful, another decoder of the same type and the same field can be applied only if the data continues being an unstructured message (raw text).
Decode_Field: If successful, can only be applied once for the same field. Decode_Field` is intended to decode a structured message.

Examples

`escaped_utf8`

Example input from /path/to/log.log:

{"log":"\u0009Checking indexes...\n","stream":"stdout","time":"2018-02-19T23:25:29.1845444Z"}
{"log":"\u0009\u0009Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary\n","stream":"stdout","time":"2018-02-19T23:25:29.1845536Z"}
{"log":"\u0009Done\n","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}

Example output:

[24] tail.0: [1519082729.184544400, {"log"=>"   Checking indexes...
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845444Z"}]
[25] tail.0: [1519082729.184553600, {"log"=>"           Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845536Z"}]
[26] tail.0: [1519082729.184562200, {"log"=>"   Done
", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845622Z"}]

Decoder configuration file:

[SERVICE]
    Parsers_File fluent-bit-parsers.conf

[INPUT]
    Name        tail
    Parser      docker
    Path        /path/to/log.log

[OUTPUT]
    Name   stdout
    Match  *

The fluent-bit-parsers.conf file:

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S %z
    Decode_Field_as escaped_utf8 log