arrow-left
All pages
gitbookPowered by GitBook
1 of 7

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Parsers

JSON

The JSON parser is the simplest option: if the original log source is a JSON map string, it will take its structure and convert it directly to the internal binary representation.

A simple configuration that can be found in the default parsers configuration file, is the entry to parse Docker log files (when the tail input plugin is used):

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S %z

The following log entry is a valid content for the parser defined above:

{"key1": 12345, "key2": "abc", "time": "2006-07-28T13:22:04Z"}

After processing, its internal representation will be:

[1154103724, {"key1"=>12345, "key2"=>"abc"}]

The time has been converted to Unix timestamp (UTC) and the map reduced to each component of the original message.

LTSV

The ltsv parser allows to parse LTSVarrow-up-right formatted texts.

Labeled Tab-separated Values (LTSV format is a variant of Tab-separated Values (TSV). Each record in a LTSV file is represented as a single line. Each field is separated by TAB and has a label and a value. The label and the value have been separated by ':'.

Here is an example how to use this format in the apache access log.

Config this in httpd.conf:

LogFormat "host:%h\tident:%l\tuser:%u\ttime:%t\treq:%r\tstatus:%>s\tsize:%b\treferer:%{Referer}i\tua:%{User-Agent}i" combined_ltsv
CustomLog "logs/access_log" combined_ltsv

The parser.conf:

[PARSER]
    Name        access_log_ltsv
    Format      ltsv
    Time_Key    time
    Time_Format [%d/%b/%Y:%H:%M:%S %z]
    Types       status:integer size:integer

The following log entry is a valid content for the parser defined above:

host:127.0.0.1  ident:- user:-  time:[10/Jul/2018:13:27:05 +0200]       req:GET / HTTP/1.1      status:200      size:16218      referer:http://127.0.0.1/       ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
host:127.0.0.1  ident:- user:-  time:[10/Jul/2018:13:27:05 +0200]       req:GET /assets/plugins/bootstrap/css/bootstrap.min.css HTTP/1.1        status:200      size:121200     referer:http://127.0.0.1/       ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
host:127.0.0.1  ident:- user:-  time:[10/Jul/2018:13:27:05 +0200]       req:GET /assets/css/headers/header-v6.css HTTP/1.1      status:200      size:37706      referer:http://127.0.0.1/       ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
host:127.0.0.1  ident:- user:-  time:[10/Jul/2018:13:27:05 +0200]       req:GET /assets/css/style.css HTTP/1.1  status:200      size:1279       referer:http://127.0.0.1/       ua:Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0

After processing, it internal representation will be:

The time has been converted to Unix timestamp (UTC).

Logfmt

The logfmt parser allows to parse the logfmt format described in . A more formal description is in .

Here is an example configuration:

The following log entry is a valid content for the parser defined above:

After processing, it internal representation will be:

If you want to be more strict than the logfmt standard and not parse lines where some attributes do not have values (such as key3) in the example above, you can configure the parser as follows:

Regular Expression

The regex parser allows to define a custom Ruby Regular Expression that will use a named capture feature to define which content belongs to which key name.

Fluent Bit uses regular expression library on Ruby mode, for testing purposes you can use the following web editor to test your expressions:

Important: do not attempt to add multiline support in your regular expressions if you are using input plugin since each line is handled as a separated entity. Instead use Tail support configuration feature.

Security Warning: Onigmo is a backtracking regex engine. You need to be careful not to use expensive regex patterns, or Onigmo can take very long time to perform pattern matching. For details, please read the article

[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET / HTTP/1.1", "status"=>200, "size"=>16218, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET /assets/plugins/bootstrap/css/bootstrap.min.css HTTP/1.1", "status"=>200, "size"=>121200, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET /assets/css/headers/header-v6.css HTTP/1.1", "status"=>200, "size"=>37706, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
[1531222025.000000000, {"host"=>"127.0.0.1", "ident"=>"-", "user"=>"-", "req"=>"GET /assets/css/style.css HTTP/1.1", "status"=>200, "size"=>1279, "referer"=>"http://127.0.0.1/", "ua"=>"Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"}]
[PARSER]
    Name        logfmt
    Format      logfmt
key1=val1 key2=val2 key3
[1540936693, {"key1"=>"val1",
              "key2"=>"val2"
              "key3"=>true}]
https://brandur.org/logfmtarrow-up-right
https://godoc.org/github.com/kr/logfmtarrow-up-right
on OWASP.

Note: understanding how regular expressions works is out of the scope of this content.

From a configuration perspective, when the format is set to regex, is mandatory and expected that a Regex configuration key exists.

hashtag
Configuration Parameters

The regex parser supports the following configuration parameters.

Key
Description
Default Value

Skip_Empty_Values

If enabled, the parser ignores empty value of the record.

True

The following parser configuration example aims to provide rules that can be applied to an Apache HTTP Server log entry:

As an example, takes the following Apache HTTP Server log entry:

The above content do not provide a defined structure for Fluent Bit, but enabling the proper parser we can help to make a structured representation of it:

A common pitfall is that you cannot use characters other than alphabets, numbers and underscore in group names. For example, a group name like (?<user-name>.*) will cause an error due to containing an invalid character (-).

In order to understand, learn and test regular expressions like the example above, we suggest you try the following Ruby Regular Expression Editor: http://rubular.com/r/X7BH0M4Ivmarrow-up-right

Onigmoarrow-up-right
http://rubular.com/arrow-up-right
Tail
Multiline
"ReDoS"arrow-up-right
[PARSER]
    Name        logfmt
    Format      logfmt
    Logfmt_No_Bare_Keys true
[PARSER]
    Name   apache
    Format regex
    Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z
    Types code:integer size:integer
192.168.2.20 - - [29/Jul/2015:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395
[1154104030, {"host"=>"192.168.2.20",
              "user"=>"-",
              "method"=>"GET",
              "path"=>"/cgi-bin/try/",
              "code"=>"200",
              "size"=>"3395",
              "referer"=>"",
              "agent"=>""
              }
]

Configuring Parser

Parsers are an important component of Fluent Bitarrow-up-right, with them you can take any unstructured log entry and give them a structure that makes easier it processing and further filtering.

The parser engine is fully configurable and can process log entries based in two types of format:

  • JSON Maps

  • (named capture)

By default, Fluent Bit provides a set of pre-configured parsers that can be used for different use cases such as logs from:

  • Apache

  • Nginx

  • Docker

Parsers are defined in one or multiple configuration files that are loaded at start time, either from the command line or through the main Fluent Bit configuration file.

Note: If you are using Regular Expressions note that Fluent Bit uses Ruby based regular expressions and we encourage to use web site as an online editor to test them.

hashtag
Configuration Parameters

Multiple parsers can be defined and each section has it own properties. The following table describes the available options for each parser definition:

Key
Description

hashtag
Parsers Configuration File

All parsers must be defined in a parsers.conf file, not in the Fluent Bit global configuration file. The parsers file expose all parsers available that can be used by the Input plugins that are aware of this feature. A parsers file can have multiple entries like this:

For more information about the parsers available, please refer to the default parsers file distributed with Fluent Bit source code:

hashtag
Time Resolution and Fractional Seconds

Time resolution and its format supported are handled by using the libc system function.

In addition, we extended our time resolution to support fractional seconds like 2017-05-17T15:44:31**.187512963**Z. Since Fluent Bit v0.12 we have full support for nanoseconds resolution, the %L format option for Time_Format is provided as a way to indicate that content must be interpreted as fractional seconds.

Note: The option %L is only valid when used after seconds (%S) or seconds since the Epoch (%s), e.g: %S.%L or %s.%L

Decoders

There are certain cases where the log messages being parsed contains encoded data, a typical use case can be found in containerized environments with Docker: application logs it data in JSON format but becomes an escaped string, Consider the following example

Original message generated by the application:

Then the Docker log message become encapsulated as follows:

as you can see the original message is handled as an escaped string. Ideally in Fluent Bit we would like to keep having the original structured message and not a string.

hashtag

Syslog rfc5424
  • Syslog rfc3164

  • Specify a fixed UTC time offset (e.g. -0600, +0200, etc.) for local dates.

    Time_Keep

    By default when a time key is recognized and parsed, the parser will drop the original time field. Enabling this option will make the parser to keep the original time field and it value in the log entry.

    Types

    Specify the data type of parsed field. The syntax is types <field_name_1>:<type_name_1> <field_name_2>:<type_name_2> .... The supported types are string(default), integer, bool, float, hex. The option is supported by ltsv, logfmt and regex.

    Decode_Field

    Decode a field value, the only decoder available is json. The syntax is: Decode_Field json <field_name>.

    Skip_Empty_Values

    Specify a boolean which determines if the parser should skip empty values. The default is true.

    Name

    Set an unique name for the parser in question.

    Format

    Specify the format of the parser, the available options here are: json, regex, ltsv or logfmt.

    Regex

    If format is regex, this option must be set specifying the Ruby Regular Expression that will be used to parse and compose the structured message.

    Time_Key

    If the log entry provides a field with a timestamp, this option specifies the name of that field.

    Time_Format

    Specify the format of the time field so it can be recognized and analyzed properly. Fluent-bit uses strptime(3) to parse time so you can refer to strptime documentationarrow-up-right for available modifiers.

    Regular Expressions
    Rubulararrow-up-right
    https://github.com/fluent/fluent-bit/blob/master/conf/parsers.confarrow-up-right
    strftime(3)arrow-up-right

    Time_Offset

    Getting Started

    Decoders are a built-in feature available through the Parsers file, each Parser definition can optionally set one or multiple decoders. There are two type of decoders type:

    • Decode_Field: if the content can be decoded in a structured message, append that structure message (keys and values) to the original log message.

    • Decode_Field_As: any content decoded (unstructured or structured) will be replaced in the same key/value, no extra keys are added.

    Our pre-defined Docker Parser have the following definition:

    Each line in the parser with a key Decode_Field instruct the parser to apply a specific decoder on a given field, optionally it offer the option to take an extra action if the decoder cannot succeed.

    hashtag
    Decoders

    Name
    Description

    json

    handle the field content as a JSON map. If it find a JSON map it will replace the content with a structured map.

    escaped

    decode an escaped string.

    escaped_utf8

    decode a UTF8 escaped string.

    hashtag
    Optional Actions

    By default if a decoder fails to decode the field or want to try a next decoder, is possible to define an optional action. Available actions are:

    Name
    Description

    try_next

    if the decoder failed, apply the next Decoder in the list for the same field.

    do_next

    if the decoder succeeded or failed, apply the next Decoder in the list for the same field.

    Note that actions are affected by some restrictions:

    • on Decode_Field_As, if succeeded, another decoder of the same type in the same field can be applied only if the data continues being an unstructured message (raw text).

    • on Decode_Field, if succeeded, can only be applied once for the same field. By nature Decode_Field aims to decode a structured message.

    hashtag
    Examples

    hashtag
    escaped_utf8

    Example input (from /path/to/log.log in configuration below)

    Example output

    Configuration file

    The fluent-bit-parsers.conf file,

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
    
    [PARSER]
        Name        syslog-rfc5424
        Format      regex
        Regex       ^\<(?<pri>[0-9]{1,5})\>1 (?<time>[^ ]+) (?<host>[^ ]+) (?<ident>[^ ]+) (?<pid>[-0-9]+) (?<msgid>[^ ]+) (?<extradata>(\[(.*)\]|-)) (?<message>.+)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
        Types pid:integer
    {"status": "up and running"}
    {"log":"{\"status\": \"up and running\"}\r\n","stream":"stdout","time":"2018-03-09T01:01:44.851160855Z"}
    [PARSER]
        Name         docker
        Format       json
        Time_Key     time
        Time_Format  %Y-%m-%dT%H:%M:%S.%L
        Time_Keep    On
        # Command       |  Decoder  | Field | Optional Action   |
        # ==============|===========|=======|===================|
        Decode_Field_As    escaped     log
    {"log":"\u0009Checking indexes...\n","stream":"stdout","time":"2018-02-19T23:25:29.1845444Z"}
    {"log":"\u0009\u0009Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary\n","stream":"stdout","time":"2018-02-19T23:25:29.1845536Z"}
    {"log":"\u0009Done\n","stream":"stdout","time":"2018-02-19T23:25:29.1845622Z"}
    [24] tail.0: [1519082729.184544400, {"log"=>"   Checking indexes...                                                   
    ", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845444Z"}]
    [25] tail.0: [1519082729.184553600, {"log"=>"           Validated: _audit _internal _introspection _telemetry _thefishbucket history main snmp_data summary
    ", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845536Z"}]
    [26] tail.0: [1519082729.184562200, {"log"=>"   Done                  
    ", "stream"=>"stdout", "time"=>"2018-02-19T23:25:29.1845622Z"}]
    [SERVICE]
        Parsers_File fluent-bit-parsers.conf
    
    [INPUT]
        Name        tail
        Parser      docker
        Path        /path/to/log.log
    
    [OUTPUT]
        Name   stdout
        Match  *
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S %z
        Decode_Field_as escaped_utf8 log