1 of 18

Filters

AWS Metadata

The AWS Filter Enriches logs with AWS Metadata. Currently the plugin adds the EC2 instance ID and availability zone to log records. To use this plugin, you must be running in EC2 and have the instance metadata service enabled.

Configuration Parameters

The plugin supports the following configuration parameters:

Key

Description

Default

Note: If you run Fluent Bit in a container, you may have to use instance metadata v1. The plugin behaves the same regardless of which version is used.

Command Line

$ bin/fluent-bit -c /PATH_TO_CONF_FILE/fluent-bit.conf

[2020/01/17 07:57:17] [ info] [engine] started (pid=32744)
[0] dummy: [1579247838.000171227, {"message"=>"dummy", "az"=>"us-west-2c", "ec2_instance_id"=>"i-0c862eca9038f5aae", "ec2_instance_type"=>"t2.medium", "private_ip"=>"172.31.6.59", "vpc_id"=>"vpc-7ea11c06", "ami_id"=>"ami-0841edc20334f9287", "account_id"=>"YOUR_ACCOUNT_ID", "hostname"=>"ip-172-31-6-59.us-west-2.compute.internal"}]
[0] dummy: [1601274509.970235760, {"message"=>"dummy", "az"=>"us-west-2c", "ec2_instance_id"=>"i-0c862eca9038f5aae", "ec2_instance_type"=>"t2.medium", "private_ip"=>"172.31.6.59", "vpc_id"=>"vpc-7ea11c06", "ami_id"=>"ami-0841edc20334f9287", "account_id"=>"YOUR_ACCOUNT_ID", "hostname"=>"ip-172-31-6-59.us-west-2.compute.internal"}]

Configuration File

[INPUT]
    Name dummy
    Tag dummy

[FILTER]
    Name aws
    Match *
    imds_version v1
    az true
    ec2_instance_id true
    ec2_instance_type true
    private_ip true
    ami_id true
    account_id true
    hostname true
    vpc_id true

[OUTPUT]
    Name stdout
    Match *

CheckList

The following plugin looks up if a value in a specified list exists and then allows the addition of a record to indicate if found. Introduced in version 1.8.4

Configuration Parameters

The plugin supports the following configuration parameters

Key

Description

Example Configuration

[INPUT]
    name           tail
    tag            test1
    path           test1.log
    read_from_head true
    parser         json

[FILTER]
    name       checklist
    match      test1
    file       ip_list.txt
    lookup_key $remote_addr
    record     ioc    abc
    record     badurl null
    log_level  debug

[OUTPUT]
    name       stdout
    match      test1

In the following configuration we will read a file test1.log that includes the following values

{"remote_addr": true, "ioc":"false", "url":"https://badurl.com/payload.htm","badurl":"no"}
{"remote_addr": "7.7.7.2", "ioc":"false", "url":"https://badurl.com/payload.htm","badurl":"no"}
{"remote_addr": "7.7.7.3", "ioc":"false", "url":"https://badurl.com/payload.htm","badurl":"no"}
{"remote_addr": "7.7.7.4", "ioc":"false", "url":"https://badurl.com/payload.htm","badurl":"no"}
{"remote_addr": "7.7.7.5", "ioc":"false", "url":"https://badurl.com/payload.htm","badurl":"no"}
{"remote_addr": "7.7.7.6", "ioc":"false", "url":"https://badurl.com/payload.htm","badurl":"no"}
{"remote_addr": "7.7.7.7", "ioc":"false", "url":"https://badurl.com/payload.htm","badurl":"no"}

Additionally, we will use the following lookup file which contains a list of malicious IPs (ip_list.txt)

1.2.3.4
6.6.4.232
7.7.7.7

In the configuration we are using $remote_addr as the lookup key and 7.7.7.7 is malicious. This means the record we would output for the last record would look like the following

{"remote_addr": "7.7.7.7", "ioc":"abc", "url":"https://badurl.com/payload.htm","badurl":"null"}

Expect

Made for testing: make sure that your records contain the expected key and values

The expect filter plugin allows you to validate that records match certain criteria in their structure, like validating that a key exists or it has a specific value.

The following page just describes the configuration properties available, for a detailed explanation of its usage and use cases, please refer the following page:

Validating and your Data and Structure

Configuration Parameters

The plugin supports the following configuration parameters:

Property

Description

Getting Started

As mentioned on top, refer to the following page for specific details of usage of this filter:

Validating and your Data and Structure

GeoIP2 Filter

Look up Geo data from IP

GeoIP2 Filter allows you to enrich the incoming data stream using location data from GeoIP2 database.

Configuration Parameters

This plugin supports the following configuration parameters:

Key

Description

Getting Started

The following configuration will process incoming remote_addr, and append country information retrieved from GeoLite2 database.

[INPUT]
    Name   dummy
    Dummy  {"remote_addr": "8.8.8.8"}

[FILTER]
    Name geoip2
    Match *
    Database GeoLite2-City.mmdb
    Lookup_key remote_addr
    Record country remote_addr %{country.names.en}
    Record isocode remote_addr %{country.iso_code}

[OUTPUT]
    Name   stdout
    Match  *

Each Record parameter above specifies the following triplet:

The field name to be added to records (country)
The lookup key to process (remote_addr)
The query for GeoIP2 database (%{country.names.en})

By running Fluent Bit with the configuration above, you will see the following output:

{"remote_addr": "8.8.8.8", "country": "United States", "isocode": "US"}

Note that the GeoLite2-City.mmdb database is available from MaxMind's official site.

Grep

Select or exclude records per patterns

The Grep Filter plugin allows you to match or exclude specific records based on regular expression patterns for values or nested values.

Configuration Parameters

The plugin supports the following configuration parameters:

Key

Value Format

Description

Record Accessor Enabled

This plugin enables the feature to specify the KEY. Using the record accessor is suggested if you want to match values against nested values.

Getting Started

In order to start filtering records, you can run the filter from the command line or through the configuration file. The following example assumes that you have a file called lines.txt with the following content:

Command Line

Note: using the command line mode need special attention to quote the regular expressions properly. It's suggested to use a configuration file.

The following command will load the tail plugin and read the content of lines.txt file. Then the grep filter will apply a regular expression rule over the log field (created by tail plugin) and only pass the records which field value starts with aa:

Configuration File

The filter allows to use multiple rules which are applied in order, you can have many Regex and Exclude entries as required.

Nested fields example

if you want to exclude records that match given nested field (for example kubernetes.labels.app), you can use the following rule:

Excluding records missing/invalid fields

It may be that in your processing pipeline you want to drop records that are missing certain keys.

A simple way to do this is just to exclude with a regex that matches anything, a missing key will fail this check.

Here is an example that checks for a specific valid value for the key as well:

The specified key iot_timestamp must match the expected expression - if it does not or is missing/empty then it will be excluded.

Kubernetes

Fluent Bit Kubernetes Filter allows to enrich your log files with Kubernetes metadata.

When Fluent Bit is deployed in Kubernetes as a DaemonSet and configured to read the log files from the containers (using tail or systemd input plugins), this filter aims to perform the following operations:

Analyze the Tag and extract the following metadata:
- Pod Name
- Namespace
- Container Name
- Container ID
Query Kubernetes API Server to obtain extra metadata for the POD in question:
- Pod ID
- Labels
- Annotations

The data is cached locally in memory and appended to each record.

Configuration Parameters

The plugin supports the following configuration parameters:

Key

Description

Default

Processing the 'log' value

Kubernetes Filter aims to provide several ways to process the data contained in the log key. The following explanation of the workflow assumes that your original Docker parser defined in parsers.conf is as follows:

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On

Since Fluent Bit v1.2 we are not suggesting the use of decoders (Decode_Field_As) if you are using Elasticsearch database in the output to avoid data type conflicts.

To perform processing of the log key, it's mandatory to enable the Merge_Log configuration property in this filter, then the following processing order will be done:

If a Pod suggest a parser, the filter will use that parser to process the content of log.
If the option Merge_Parser was set and the Pod did not suggest a parser, process the log content using the suggested parser in the configuration.
If no Pod was suggested and no Merge_Parser is set, try to handle the content as JSON.

If log value processing fails, the value is untouched. The order above is not chained, meaning it's exclusive and the filter will try only one of the options above, not all of them.

Kubernetes Annotations

A flexible feature of Fluent Bit Kubernetes filter is that allow Kubernetes Pods to suggest certain behaviors for the log processor pipeline when processing the records. At the moment it support:

Suggest a pre-defined parser
Request to exclude logs

The following annotations are available:

Annotation Examples in Pod definition

Suggest a parser

The following Pod definition runs a Pod that emits Apache logs to the standard output, in the Annotations it suggest that the data should be processed using the pre-defined parser called apache:

apiVersion: v1
kind: Pod
metadata:
  name: apache-logs
  labels:
    app: apache-logs
  annotations:
    fluentbit.io/parser: apache
spec:
  containers:
  - name: apache
    image: edsiper/apache_logs

Request to exclude logs

There are certain situations where the user would like to request that the log processor simply skip the logs from the Pod in question:

apiVersion: v1
kind: Pod
metadata:
  name: apache-logs
  labels:
    app: apache-logs
  annotations:
    fluentbit.io/exclude: "true"
spec:
  containers:
  - name: apache
    image: edsiper/apache_logs

Note that the annotation value is boolean which can take a true or false and must be quoted.

Workflow of Tail + Kubernetes Filter

Kubernetes Filter depends on either Tail or Systemd input plugins to process and enrich records with Kubernetes metadata. Here we will explain the workflow of Tail and how it configuration is correlated with Kubernetes filter. Consider the following configuration example (just for demo purposes, not production):

[INPUT]
    Name    tail
    Tag     kube.*
    Path    /var/log/containers/*.log
    Parser  docker

[FILTER]
    Name             kubernetes
    Match            kube.*
    Kube_URL         https://kubernetes.default.svc:443
    Kube_CA_File     /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File  /var/run/secrets/kubernetes.io/serviceaccount/token
    Kube_Tag_Prefix  kube.var.log.containers.
    Merge_Log        On
    Merge_Log_Key    log_processed

In the input section, the Tail plugin will monitor all files ending in .log in path /var/log/containers/. For every file it will read every line and apply the docker parser. Then the records are emitted to the next step with an expanded tag.

Tail support Tags expansion, which means that if a tag have a star character (*), it will replace the value with the absolute path of the monitored file, so if you file name and path is:

/var/log/container/apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

then the Tag for every record of that file becomes:

kube.var.log.containers.apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

note that slashes are replaced with dots.

When Kubernetes Filter runs, it will try to match all records that starts with kube. (note the ending dot), so records from the file mentioned above will hit the matching rule and the filter will try to enrich the records

Kubernetes Filter do not care from where the logs comes from, but it cares about the absolute name of the monitored file, because that information contains the pod name and namespace name that are used to retrieve associated metadata to the running Pod from the Kubernetes Master/API Server.

If you have large pod specifications (can be caused by large numbers of environment variables, etc.), be sure to increase the Buffer_Size parameter of the kubernetes filter. If object sizes exceed this buffer, some metadata will fail to be injected to the logs.

If the configuration property Kube_Tag_Prefix was configured (available on Fluent Bit >= 1.1.x), it will use that value to remove the prefix that was appended to the Tag in the previous Input section. Note that the configuration property defaults to kube.var.logs.containers. , so the previous Tag content will be transformed from:

kube.var.log.containers.apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

to:

apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

the transformation above do not modify the original Tag, just creates a new representation for the filter to perform metadata lookup.

that new value is used by the filter to lookup the pod name and namespace, for that purpose it uses an internal Regular expression:

(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$

If you want to know more details, check the source code of that definition here.

You can see on Rublar.com web site how this operation is performed, check the following demo link:

https://rubular.com/r/HZz3tYAahj6JCd

Custom Regex

Under certain and not common conditions, a user would want to alter that hard-coded regular expression, for that purpose the option Regex_Parser can be used (documented on top).

Final Comments

So at this point the filter is able to gather the values of pod_name and namespace, with that information it will check in the local cache (internal hash table) if some metadata for that key pair exists, if so, it will enrich the record with the metadata value, otherwise it will connect to the Kubernetes Master/API Server and retrieve that information.

Optional Feature: Using Kubelet to Get Metadata

There is an issue reported about kube-apiserver fall over and become unresponsive when cluster is too large and too many requests are sent to it. For this feature, fluent bit Kubernetes filter will send the request to kubelet /pods endpoint instead of kube-apiserver to retrieve the pods information and use it to enrich the log. Since Kubelet is running locally in nodes, the request would be responded faster and each node would only get one request one time. This could save kube-apiserver power to handle other requests. When this feature is enabled, you should see no difference in the kubernetes metadata added to logs, but the Kube-apiserver bottleneck should be avoided when cluster is large.

Configuration Setup

There are some configuration setup needed for this feature.

Role Configuration for Fluent Bit DaemonSet Example:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentbitds
  namespace: fluentbit-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: fluentbit
rules:
  - apiGroups: [""]
    resources:
      - namespaces
      - pods
      - nodes
      - nodes/proxy
    verbs: 
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: fluentbit
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluentbit
subjects:
  - kind: ServiceAccount
    name: fluentbitds
    namespace: fluentbit-system

The difference is that kubelet need a special permission for resource nodes/proxy to get HTTP request in. When creating the role or clusterRole, you need to add nodes/proxy into the rule for resource.

Fluent Bit Configuration Example:

    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        DB                /var/log/flb_kube.db
        Parser            docker
        Docker_Mode       On
        Mem_Buf_Limit     50MB
        Skip_Long_Lines   On
        Refresh_Interval  10

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        Buffer_Size         0
        Use_Kubelet         true
        Kubelet_Port        10250

So for fluent bit configuration, you need to set the Use_Kubelet to true to enable this feature.

DaemonSet config Example:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentbit
  namespace: fluentbit-system
  labels:
    app.kubernetes.io/name: fluentbit
spec:
  selector:
    matchLabels:
      name: fluentbit
  template:
    metadata:
      labels:
        name: fluentbit
    spec:
      serviceAccountName: fluentbitds
      containers:
        - name: fluent-bit
          imagePullPolicy: Always
          image: fluent/fluent-bit:latest
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluentbit-config
              mountPath: /fluent-bit/etc/
          resources:
            limits:
              memory: 1500Mi
            requests:
              cpu: 500m
              memory: 500Mi
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: fluentbit-config
          configMap:
            name: fluentbit-config

The key point is to set hostNetwork to true and dnsPolicy to ClusterFirstWithHostNet that fluent bit DaemonSet could call Kubelet locally. Otherwise it could not resolve the dns for kubelet.

Now you are good to use this new feature!

Verify that the Use_Kubelet option is working

Basically you should see no difference about your experience for enriching your log files with Kubernetes metadata.

To check if Fluent Bit is using the kubelet, you can check fluent bit logs and there should be a log like this:

[ info] [filter:kubernetes:kubernetes.0] testing connectivity with Kubelet...

And if you are in debug mode, you could see more:

[debug] [filter:kubernetes:kubernetes.0] Send out request to Kubelet for pods information.
[debug] [filter:kubernetes:kubernetes.0] Request (ns=<namespace>, pod=node name) http_do=0, HTTP Status: 200
[ info] [filter:kubernetes:kubernetes.0] connectivity OK
[2021/02/05 10:33:35] [debug] [filter:kubernetes:kubernetes.0] Request (ns=<Namespace>, pod=<podName>) http_do=0, HTTP Status: 200
[2021/02/05 10:33:35] [debug] [filter:kubernetes:kubernetes.0] kubelet find pod: <podName> and ns: <Namespace> match

Troubleshooting

The following section goes over specific log messages you may run into and how to solve them to ensure that Fluent Bit's Kubernetes filter is operating properly

I can't see metadata appended to my pod or other Kubernetes objects

If you are not seeing metadata added to your kubernetes logs and see the following in your log message, then you may be facing connectivity issues with the Kubernetes API server.

[2020/10/15 03:48:57] [ info] [filter_kube] testing connectivity with API server...
[2020/10/15 03:48:57] [error] [filter_kube] upstream connection error
[2020/10/15 03:48:57] [ warn] [filter_kube] could not get meta for POD

Potential fix #1: Check Kubernetes roles

When Fluent Bit is deployed as a DaemonSet it generally runs with specific roles that allow the application to talk to the Kubernetes API server. If you are deployed in a more restricted environment check that all the Kubernetes roles are set correctly.

You can test this by running the following command (replace fluentbit-system with the namespace where your fluentbit is installed)

kubectl auth can-i list pods --as=system:serviceaccount:fluentbit-system:fluentbit

If set roles are configured correctly, it should simply respond with yes.

For instance, using Azure AKS, running the above command may respond with:

no - Azure does not have opinion for this user.

If you have connectivity to the API server, but still "could not get meta for POD" - debug logging might give you a message with Azure does not have opinion for this user. Then the following subject may need to be included in the fluentbit ClusterRoleBinding:

appended to subjects array:

- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts

Potential fix #2: Check Kubernetes IPv6

There may be cases where you have IPv6 on in the environment and you need to enable this within Fluent Bit. Under the service tag please set the following option ipv6 to on .

Potential fix #3: Check connectivity to Kube_URL

By default the Kube_URL is set to https://kubernetes.default.svc:443 . Ensure that you have connectivity to this endpoint from within the cluster and that there are no special permission interfering with the connection.

I can't see new objects getting metadata

In some cases, you may only see some objects being appended with metadata while other objects are not enriched. This can occur at times when local data is cached and does not contain the correct id for the kubernetes object that requires enrichment. For most Kubernetes objects the Kubernetes API server is updated which will then be reflected in Fluent Bit logs, however in some cases for Pod objects this refresh to the Kubernetes API server can be skipped, causing metadata to be skipped.

Lua

The Lua filter allows you to modify the incoming records (even split one record into multiple records) using custom scripts.

Due to the necessity to have a flexible filtering mechanism, it is now possible to extend Fluent Bit capabilities by writing custom filters using Lua programming language. A Lua-based filter takes two steps:

Configure the Filter in the main configuration
Prepare a Lua script that will be used by the Filter

Configuration Parameters

The plugin supports the following configuration parameters:

Key

Description

Getting Started

In order to test the filter, you can run the plugin from the command line or through the configuration file. The following examples use the input plugin for data ingestion, invoke Lua filter using the script and call the function which only prints the same information to the standard output:

Command Line

From the command line you can use the following options:

Configuration File

In your main configuration file append the following Input, Filter & Output sections:

Lua Script Filter API

The life cycle of a filter have the following steps:

Upon Tag matching by this filter, it may process or bypass the record.
If tag matched, it will accept the record and invoke the function defined in the call property which basically is the name of a function defined in the Lua script.
Invoke Lua function and pass each record in JSON format.
Upon return, validate return value and continue the pipeline.

Callback Prototype

The Lua script can have one or multiple callbacks that can be used by this filter. The function prototype is as follows:

Function Arguments

Return Values

Each callback must return three values:

Code Examples

For functional examples of this interface, please refer to the code samples provided in the source code of the project located here:

Inline configuration

In classic mode:

Environment variable processing

Kubernetes pods generally have various environment variables set by the infrastructure automatically which may contain useful information.

In this example, we want to extract part of the Kubernetes cluster API name.

The environment variable is set like so: KUBERNETES_SERVICE_HOST: api.sandboxbsh-a.project.domain.com

We want to extract the sandboxbsh name and add it to our record as a special key.

Number Type

+Lua treats number as double. It means an integer field (e.g. IDs, log levels) will be converted double. To avoid type conversion, The type_int_key property is available.

Protected Mode

Record Split

The Lua callback function can return an array of tables (i.e., array of records) in its third record return value. With this feature, the Lua filter can split one input record into multiple records according to custom logic.

For example:

Lua script

Configuration

Input

Output

Parser

The Parser Filter plugin allows for parsing fields in event records.

Configuration Parameters

The plugin supports the following configuration parameters:

Key

Description

Default

Getting Started

Configuration File

This is an example of parsing a record {"data":"100 0.5 true This is example"}.

The plugin needs a parser file which defines how to parse each field.

[PARSER]
    Name dummy_test
    Format regex
    Regex ^(?<INT>[^ ]+) (?<FLOAT>[^ ]+) (?<BOOL>[^ ]+) (?<STRING>.+)$

The path of the parser file should be written in configuration file under the [SERVICE] section.

[SERVICE]
    Parsers_File /path/to/parsers.conf

[INPUT]
    Name dummy
    Tag  dummy.data
    Dummy {"data":"100 0.5 true This is example"}

[FILTER]
    Name parser
    Match dummy.*
    Key_Name data
    Parser dummy_test

[OUTPUT]
    Name stdout
    Match *

The output is

$ fluent-bit -c dummy.conf
Fluent Bit v1.x.x
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2017/07/06 22:33:12] [ info] [engine] started
[0] dummy.data: [1499347993.001371317, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
[1] dummy.data: [1499347994.001303118, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
[2] dummy.data: [1499347995.001296133, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
[3] dummy.data: [1499347996.001320284, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]

You can see the records {"data":"100 0.5 true This is example"} are parsed.

Preserve original fields

By default, the parser plugin only keeps the parsed fields in its output.

If you enable Reserve_Data, all other fields are preserved:

[PARSER]
    Name dummy_test
    Format regex
    Regex ^(?<INT>[^ ]+) (?<FLOAT>[^ ]+) (?<BOOL>[^ ]+) (?<STRING>.+)$

[SERVICE]
    Parsers_File /path/to/parsers.conf

[INPUT]
    Name dummy
    Tag  dummy.data
    Dummy {"data":"100 0.5 true This is example", "key1":"value1", "key2":"value2"}

[FILTER]
    Name parser
    Match dummy.*
    Key_Name data
    Parser dummy_test
    Reserve_Data On

This will produce the output:

$ fluent-bit -c dummy.conf
Fluent-Bit v0.12.0
Copyright (C) Treasure Data

[2017/07/06 22:33:12] [ info] [engine] started
[0] dummy.data: [1499347993.001371317, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}, "key1":"value1", "key2":"value2"]
[1] dummy.data: [1499347994.001303118, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}, "key1":"value1", "key2":"value2"]
[2] dummy.data: [1499347995.001296133, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}, "key1":"value1", "key2":"value2"]
[3] dummy.data: [1499347996.001320284, {"INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}, "key1":"value1", "key2":"value2"]

If you enable Reserved_Data and Preserve_Key, the original key field will be preserved as well:

[PARSER]
    Name dummy_test
    Format regex
    Regex ^(?<INT>[^ ]+) (?<FLOAT>[^ ]+) (?<BOOL>[^ ]+) (?<STRING>.+)$

[SERVICE]
    Parsers_File /path/to/parsers.conf

[INPUT]
    Name dummy
    Tag  dummy.data
    Dummy {"data":"100 0.5 true This is example", "key1":"value1", "key2":"value2"}

[FILTER]
    Name parser
    Match dummy.*
    Key_Name data
    Parser dummy_test
    Reserve_Data On
    Preserve_Key On

This will produce the following output:

$ fluent-bit -c dummy.conf
Fluent-Bit v0.12.0
Copyright (C) Treasure Data

[2017/07/06 22:33:12] [ info] [engine] started
[0] dummy.data: [1499347993.001371317, {"data":"100 0.5 true This is example", "INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
[1] dummy.data: [1499347994.001303118, {"data":"100 0.5 true This is example", "INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
[2] dummy.data: [1499347995.001296133, {"data":"100 0.5 true This is example", "INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]
[3] dummy.data: [1499347996.001320284, {"data":"100 0.5 true This is example", "INT"=>"100", "FLOAT"=>"0.5", "BOOL"=>"true", "STRING"=>"This is example"}]

Record Modifier

The Record Modifier Filter plugin allows to append fields or to exclude specific fields.

Configuration Parameters

The plugin supports the following configuration parameters: Remove_key and Allowlist_key are exclusive.

Key

Description

Getting Started

In order to start filtering records, you can run the filter from the command line or through the configuration file.

This is a sample in_mem record to filter.

{"Mem.total"=>1016024, "Mem.used"=>716672, "Mem.free"=>299352, "Swap.total"=>2064380, "Swap.used"=>32656, "Swap.free"=>2031724}

Append fields

The following configuration file is to append product name and hostname (via environment variable) to record.

[INPUT]
    Name mem
    Tag  mem.local

[OUTPUT]
    Name  stdout
    Match *

[FILTER]
    Name record_modifier
    Match *
    Record hostname ${HOSTNAME}
    Record product Awesome_Tool

You can also run the filter from command line.

$ fluent-bit -i mem -o stdout -F record_modifier -p 'Record=hostname ${HOSTNAME}' -p 'Record=product Awesome_Tool' -m '*'

The output will be

[0] mem.local: [1492436882.000000000, {"Mem.total"=>1016024, "Mem.used"=>716672, "Mem.free"=>299352, "Swap.total"=>2064380, "Swap.used"=>32656, "Swap.free"=>2031724, "hostname"=>"localhost.localdomain", "product"=>"Awesome_Tool"}]

Remove fields with Remove_key

The following configuration file is to remove 'Swap.*' fields.

[INPUT]
    Name mem
    Tag  mem.local

[OUTPUT]
    Name  stdout
    Match *

[FILTER]
    Name record_modifier
    Match *
    Remove_key Swap.total
    Remove_key Swap.used
    Remove_key Swap.free

You can also run the filter from command line.

$ fluent-bit -i mem -o stdout -F  record_modifier -p 'Remove_key=Swap.total' -p 'Remove_key=Swap.free' -p 'Remove_key=Swap.used' -m '*'

The output will be

[0] mem.local: [1492436998.000000000, {"Mem.total"=>1016024, "Mem.used"=>716672, "Mem.free"=>295332}]

Remove fields with Allowlist_key

The following configuration file is to remain 'Mem.*' fields.

[INPUT]
    Name mem
    Tag  mem.local

[OUTPUT]
    Name  stdout
    Match *

[FILTER]
    Name record_modifier
    Match *
    Allowlist_key Mem.total
    Allowlist_key Mem.used
    Allowlist_key Mem.free

You can also run the filter from command line.

$ fluent-bit -i mem -o stdout -F  record_modifier -p 'Allowlist_key=Mem.total' -p 'Allowlist_key=Mem.free' -p 'Allowlist_key=Mem.used' -m '*'

The output will be

[0] mem.local: [1492436998.000000000, {"Mem.total"=>1016024, "Mem.used"=>716672, "Mem.free"=>295332}]

Modify

The Modify Filter plugin allows you to change records using rules and conditions.

Example usage

As an example using JSON notation to,

Rename Key2 to RenamedKey
Add a key OtherKey with value Value3 if OtherKey does not yet exist

Example (input)

{
  "Key1"     : "Value1",
  "Key2"     : "Value2"
}

Example (output)

{
  "Key1"       : "Value1",
  "RenamedKey" : "Value2",
  "OtherKey"   : "Value3"
}

Configuration Parameters

Rules

The plugin supports the following rules:

Rules are case insensitive, parameters are not
Any number of rules can be set in a filter instance.
Rules are applied in the order they appear, with each rule operating on the result of the previous rule.

Conditions

The plugin supports the following conditions:

Conditions are case insensitive, parameters are not
Any number of conditions can be set.
Conditions apply to the whole filter instance and all its rules. Not to individual rules.
All conditions have to be true for the rules to be applied.
You can set Record Accessor as STRING:KEY for nested key.

Example #1 - Add and Rename

In order to start filtering records, you can run the filter from the command line or through the configuration file. The following invokes the Memory Usage Input Plugin, which outputs the following (example),

[0] memory: [1488543156, {"Mem.total"=>1016044, "Mem.used"=>841388, "Mem.free"=>174656, "Swap.total"=>2064380, "Swap.used"=>139888, "Swap.free"=>1924492}]
[1] memory: [1488543157, {"Mem.total"=>1016044, "Mem.used"=>841420, "Mem.free"=>174624, "Swap.total"=>2064380, "Swap.used"=>139888, "Swap.free"=>1924492}]
[2] memory: [1488543158, {"Mem.total"=>1016044, "Mem.used"=>841420, "Mem.free"=>174624, "Swap.total"=>2064380, "Swap.used"=>139888, "Swap.free"=>1924492}]
[3] memory: [1488543159, {"Mem.total"=>1016044, "Mem.used"=>841420, "Mem.free"=>174624, "Swap.total"=>2064380, "Swap.used"=>139888, "Swap.free"=>1924492}]

Using command Line

Note: Using the command line mode requires quotes parse the wildcard properly. The use of a configuration file is recommended.

bin/fluent-bit -i mem \
  -p 'tag=mem.local' \
  -F modify \
  -p 'Add=Service1 SOMEVALUE' \
  -p 'Add=Service2 SOMEVALUE3' \
  -p 'Add=Mem.total2 TOTALMEM2' \
  -p 'Rename=Mem.free MEMFREE' \
  -p 'Rename=Mem.used MEMUSED' \
  -p 'Rename=Swap.total SWAPTOTAL' \
  -p 'Add=Mem.total TOTALMEM' \
  -m '*' \
  -o stdout

Configuration File

[INPUT]
    Name mem
    Tag  mem.local

[OUTPUT]
    Name  stdout
    Match *

[FILTER]
    Name modify
    Match *
    Add Service1 SOMEVALUE
    Add Service3 SOMEVALUE3
    Add Mem.total2 TOTALMEM2
    Rename Mem.free MEMFREE
    Rename Mem.used MEMUSED
    Rename Swap.total SWAPTOTAL
    Add Mem.total TOTALMEM

Result

The output of both the command line and configuration invocations should be identical and result in the following output.

[2018/04/06 01:35:13] [ info] [engine] started
[0] mem.local: [1522980610.006892802, {"Mem.total"=>4050908, "MEMUSED"=>738100, "MEMFREE"=>3312808, "SWAPTOTAL"=>1046524, "Swap.used"=>0, "Swap.free"=>1046524, "Service1"=>"SOMEVALUE", "Service3"=>"SOMEVALUE3", "Mem.total2"=>"TOTALMEM2"}]
[1] mem.local: [1522980611.000658288, {"Mem.total"=>4050908, "MEMUSED"=>738068, "MEMFREE"=>3312840, "SWAPTOTAL"=>1046524, "Swap.used"=>0, "Swap.free"=>1046524, "Service1"=>"SOMEVALUE", "Service3"=>"SOMEVALUE3", "Mem.total2"=>"TOTALMEM2"}]
[2] mem.local: [1522980612.000307652, {"Mem.total"=>4050908, "MEMUSED"=>738068, "MEMFREE"=>3312840, "SWAPTOTAL"=>1046524, "Swap.used"=>0, "Swap.free"=>1046524, "Service1"=>"SOMEVALUE", "Service3"=>"SOMEVALUE3", "Mem.total2"=>"TOTALMEM2"}]
[3] mem.local: [1522980613.000122671, {"Mem.total"=>4050908, "MEMUSED"=>738068, "MEMFREE"=>3312840, "SWAPTOTAL"=>1046524, "Swap.used"=>0, "Swap.free"=>1046524, "Service1"=>"SOMEVALUE", "Service3"=>"SOMEVALUE3", "Mem.total2"=>"TOTALMEM2"}]

Example #2 - Conditionally Add and Remove

Configuration File

[INPUT]
    Name mem
    Tag  mem.local
    Interval_Sec 1

[FILTER]
    Name    modify
    Match   mem.*

    Condition Key_Does_Not_Exist cpustats
    Condition Key_Exists Mem.used

    Set cpustats UNKNOWN

[FILTER]
    Name    modify
    Match   mem.*

    Condition Key_Value_Does_Not_Equal cpustats KNOWN

    Add sourcetype memstats

[FILTER]
    Name    modify
    Match   mem.*

    Condition Key_Value_Equals cpustats UNKNOWN

    Remove_wildcard Mem
    Remove_wildcard Swap
    Add cpustats_more STILL_UNKNOWN

[OUTPUT]
    Name           stdout
    Match          *

Result

[2018/06/14 07:37:34] [ info] [engine] started (pid=1493)
[0] mem.local: [1528925855.000223110, {"cpustats"=>"UNKNOWN", "sourcetype"=>"memstats", "cpustats_more"=>"STILL_UNKNOWN"}]
[1] mem.local: [1528925856.000064516, {"cpustats"=>"UNKNOWN", "sourcetype"=>"memstats", "cpustats_more"=>"STILL_UNKNOWN"}]
[2] mem.local: [1528925857.000165965, {"cpustats"=>"UNKNOWN", "sourcetype"=>"memstats", "cpustats_more"=>"STILL_UNKNOWN"}]
[3] mem.local: [1528925858.000152319, {"cpustats"=>"UNKNOWN", "sourcetype"=>"memstats", "cpustats_more"=>"STILL_UNKNOWN"}]

Example #3 - Emoji

Configuration File

[INPUT]
    Name mem
    Tag  mem.local

[OUTPUT]
    Name  stdout
    Match *

[FILTER]
    Name modify
    Match *

    Remove_Wildcard Mem
    Remove_Wildcard Swap
    Set This_plugin_is_on 🔥
    Set 🔥 is_hot
    Copy 🔥 💦
    Rename  💦 ❄️
    Set ❄️ is_cold
    Set 💦 is_wet

Result

[2018/06/14 07:46:11] [ info] [engine] started (pid=21875)
[0] mem.local: [1528926372.000197916, {"This_plugin_is_on"=>"🔥", "🔥"=>"is_hot", "❄️"=>"is_cold", "💦"=>"is_wet"}]
[1] mem.local: [1528926373.000107868, {"This_plugin_is_on"=>"🔥", "🔥"=>"is_hot", "❄️"=>"is_cold", "💦"=>"is_wet"}]
[2] mem.local: [1528926374.000181042, {"This_plugin_is_on"=>"🔥", "🔥"=>"is_hot", "❄️"=>"is_cold", "💦"=>"is_wet"}]
[3] mem.local: [1528926375.000090841, {"This_plugin_is_on"=>"🔥", "🔥"=>"is_hot", "❄️"=>"is_cold", "💦"=>"is_wet"}]
[0] mem.local: [1528926376.000610974, {"This_plugin_is_on"=>"🔥", "🔥"=>"is_hot", "❄️"=>"is_cold", "💦"=>"is_wet"}]

Multiline

Concatenate Multiline or Stack trace log messages. Available on Fluent Bit >= v1.8.2.

The Multiline Filter helps to concatenate messages that originally belong to one context but were split across multiple records or log lines. Common examples are stack traces or applications that print logs in multiple lines.

As part of the built-in functionality, without major configuration effort, you can enable one of ours built-in parsers with auto detection and multi format support:

go
python
java (Google Cloud Platform Java stacktrace format)

Some comments about this filter:

The usage of this filter depends on a previous configuration of a definition.
If you wish to concatenate messages read from a log file, it is highly recommended to use the multiline support in the itself. This is because performing concatenation while reading the log file is more performant. Concatenating messages originally split by Docker or CRI container engines, is supported in the .

This filter only performs buffering that persists across different Chunks when Buffer is enabled. Otherwise, the filter will process one Chunk at a time and is not suitable for most inputs which might send multiline messages in separate chunks.

When buffering is enabled, the filter does not immediately emit messages it receives. It uses the in_emitter plugin, same as the , and emits messages once they are fully concatenated, or a timeout is reached.

Since concatenated records are re-emitted to the head of the Fluent Bit log pipeline, you can not configure multiple multiline filter definitions that match the same tags. This will cause an infinite loop in the Fluent Bit pipeline; to use multiple parsers on the same logs, configure a single filter definitions with a comma separated list of parsers for multiline.parser. For more, see issue .

Secondly, for the same reason, the multiline filter should be the first filter. Logs will be re-emitted by the multiline filter to the head of the pipeline- the filter will ignore its own re-emitted records, but other filters won't. If there are filters before the multiline filter, they will be applied twice.

Configuration Parameters

The plugin supports the following configuration parameters:

Property

Description

Configuration Example

The following example aims to parse a log file called test.log that contains some full lines, a custom Java stacktrace and a Go stacktrace.

Example files content:

This is the primary Fluent Bit configuration file. It includes the parsers_multiline.conf and tails the file test.log by applying the multiline parsers multiline-regex-test and go. Then it sends the processing to the standard output.

This second file defines a multiline parser for the example. Note that a second multiline parser called go is used in fluent-bit.conf, but this one is a built-in parser.

An example file with multiline and multiformat content:

By running Fluent Bit with the given configuration file you will obtain:

The lines that did not match a pattern are not considered as part of the multiline message, while the ones that matched the rules were concatenated properly.

Docker Partial Message Use Case

Fluent Bit can re-combine these logs that were split by the runtime and remove the partial message fields. The filter example below is for this use case.

The two options for mode are mutually exclusive in the filter. If you set the mode to partial_message then the multiline.parser option is not allowed.

Nest

The Nest Filter plugin allows you to operate on or with nested data. Its modes of operation are

nest - Take a set of records and place them in a map
lift - Take a map by key and lift its records up

Example usage (nest)

As an example using JSON notation, to nest keys matching the Wildcard value Key* under a new key NestKey the transformation becomes,

Example (input)

Example (output)

Example usage (lift)

As an example using JSON notation, to lift keys nested under the Nested_under value NestKey* the transformation becomes,

Example (input)

Example (output)

Configuration Parameters

The plugin supports the following configuration parameters:

Getting Started

Example #1 - nest

Command Line

Note: Using the command line mode requires quotes parse the wildcard properly. The use of a configuration file is recommended.

The following command will load the mem plugin. Then the nest filter will match the wildcard rule to the keys and nest the keys matching Mem.* under the new key NEST.

Configuration File

Result

The output of both the command line and configuration invocations should be identical and result in the following output.

Example #2 - nest and lift undo

This example nests all Mem.* and Swap,* items under the Stats key and then reverses these actions with a lift operation. The output appears unchanged.

Configuration File

Result

Example #3 - nest 3 levels deep

This example takes the keys starting with Mem.* and nests them under LAYER1, which itself is then nested under LAYER2, which is nested under LAYER3.

Configuration File

Result

Example #4 - multiple nest and lift filters with prefix

This example starts with the 3-level deep nesting of Example 2 and applies the lift filter three times to reverse the operations. The end result is that all records are at the top level, without nesting, again. One prefix is added for each level that is lifted.

Configuration file

Result

Nightfall

The Nightfall filter scans logs for sensitive data and redacts the sensitive portions. This filter supports scanning for various sensitive information, ranging from API keys and personally identifiable information(PII) to custom regexes you define. You can configure what to scan for in the .

This filter is not enabled by default in 1.9.0 due to a typo. It must be enabled by setting flag -DFLB_FILTER_NIGHTFALL=ON when building. In 1.9.1 and above this is fixed.

Configuration Parameters

The plugin supports the following configuration parameters:

Key

Description

Default

Command Line

Configuration File

Rewrite Tag

Powerful and flexible routing

Tags are what makes possible. Tags are set in the configuration of the Input definitions where the records are generated, but there are certain scenarios where might be useful to modify the Tag in the pipeline so we can perform more advanced and flexible routing.

The rewrite_tag filter, allows to re-emit a record under a new Tag. Once a record has been re-emitted, the original record can be preserved or discarded.

How it Works

The way it works is defining rules that matches specific record key content against a regular expression, if a match exists, a new record with the defined Tag will be emitted, entering from the beginning of the pipeline. Multiple rules can be specified and they are processed in order until one of them matches.

The new Tag to define can be composed by:

Alphabet characters & Numbers
Original Tag string or part of it
Regular Expressions groups capture
Any key or sub-key of the processed record
Environment variables

Configuration Parameters

The rewrite_tag filter supports the following configuration parameters:

Key

Description

Rules

A rule aims to define matching criteria and specify how to create a new Tag for a record. You can define one or multiple rules in the same configuration section. The rules have the following format:

Key

The key represents the name of the record key that holds the value that we want to use to match our regular expression. A key name is specified and prefixed with a $. Consider the following structured record (formatted for readability):

If we wanted to match against the value of the key name we must use $name. The key selector is flexible enough to allow to match nested levels of sub-maps from the structure. If we wanted to check the value of the nested key s2 we can do it specifying $ss['s1']['s2'], for short:

$name = "abc-123"
$ss['s1']['s2'] = "flb"

Note that a key must point a value that contains a string, it's not valid for numbers, booleans, maps or arrays.

Regex

Using a simple regular expression we can specify a matching pattern to use against the value of the key specified above, also we can take advantage of group capturing to create custom placeholder values.

If we wanted to match any record that it $name contains a value of the format string-number like the example provided above, we might use:

Note that in our example we are using parentheses, this teams that we are specifying groups of data. If the pattern matches the value a placeholder will be created that can be consumed by the NEW_TAG section.

If $name equals abc-123 , then the following placeholders will be created:

$0 = "abc-123"
$1 = "abc"
$2 = "123"

If the Regular expression do not matches an incoming record, the rule will be skipped and the next rule (if any) will be processed.

New Tag

If a regular expression has matched the value of the defined key in the rule, we are ready to compose a new Tag for that specific record. The tag is a concatenated string that can contain any of the following characters: a-z,A-Z, 0-9 and .-,.

A Tag can take any string value from the matching record, the original tag it self, environment variable or general placeholder.

Consider the following incoming data on the rule:

Tag = aa.bb.cc
Record = {"name": "abc-123", "ss": {"s1": {"s2": "flb"}}}
Environment variable $HOSTNAME = fluent

With such information we could create a very custom Tag for our record like the following:

the expected Tag to generated will be:

We make use of placeholders, record content and environment variables.

Keep

If a rule matches a rule the filter will emit a copy of the record with the new defined Tag. The property keep takes a boolean value to define if the original record with the old Tag must be preserved and continue in the pipeline or just be discarded.

You can use true or false to decide the expected behavior. There is no default value and this is a mandatory field in the rule.

Configuration Example

The following configuration example will emit a dummy (hand-crafted) record, the filter will rewrite the tag, discard the old record and print the new record to the standard output interface:

The original tag test_tag will be rewritten as from.test_tag.new.fluent.bit.out:

Monitoring

Since rewrite_tag emit new records that goes through the beginning of the pipeline, it exposes an additional metric called emit_records that summarize the total number of emitted records.

Understanding the Metrics

Using the configuration provided above, if we query the metrics exposed in the HTTP interface we will see the following:

Command:

Metrics output:

The dummy input generated two records, the filter dropped two from the chunks and emitted two new ones under a different Tag.

The records generated are handled by the internal Emitter, so the new records are summarized in the Emitter metrics, take a look at the entry called emitter_for_rewrite_tag.0.

What is the Emitter ?

The Emitter is an internal Fluent Bit plugin that allows other components of the pipeline to emit custom records. On this case rewrite_tag creates an Emitter instance to use it exclusively to emit records, on that way we can have a granular control of who is emitting what.

The Emitter name in the metrics can be changed setting up the Emitter_Name configuration property described above.

Standard Output

The stdout filter plugin allows printing to the standard output the data flowed through the filter plugin, which can be very useful while debugging.

The plugin has no configuration parameters, is very simple to use.

Command Line

We have specified to gather usage metrics and print them out in a human-readable way when they flow through the stdout plugin.

Throttle

The Throttle Filter plugin sets the average Rate of messages per Interval, based on leaky bucket and sliding window algorithm. In case of overflood, it will leak within certain rate.

Configuration Parameters

The plugin supports the following configuration parameters:

Key

Value Format

Description

Functional description

Lets imagine we have configured:

we received 1 message first second, 3 messages 2nd, and 5 3rd. As you can see, disregard that Window is actually 5, we use "slow" start to prevent overflooding during the startup.

But as soon as we reached Window size * Interval, we will have true sliding window with aggregation over complete window.

When we have average over window is more than Rate, we will start dropping messages, so that

will become:

As you can see, last pane of the window was overwritten and 1 message was dropped.

Interval vs Window size

You might noticed possibility to configure Interval of the Window shift. It is counter intuitive, but there is a difference between two examples above:

and

Even though both examples will allow maximum Rate of 60 messages per minute, first example may get all 60 messages within first second, and will drop all the rest for the entire minute:

While the second example will not allow more than 1 message per second every second, making output rate more smooth:

It may drop some data if the rate is ragged. I would recommend to use bigger interval and rate for streams of rare but important events, while keep Window bigger and Interval small for constantly intensive inputs.

Command Line

Note: It's suggested to use a configuration file.

The following command will load the tail plugin and read the content of lines.txt file. Then the throttle filter will apply a rate limit and only pass the records which are read below the certain rate:

Configuration File

The example above will pass 1000 messages per second in average over 300 seconds.

Kubernetes

Fluent Bit Kubernetes Filter allows to enrich your log files with Kubernetes metadata.

Analyze the Tag and extract the following metadata:
- Pod Name
- Namespace
- Container Name
- Container ID
Query Kubernetes API Server to obtain extra metadata for the POD in question:
- Pod ID
- Labels
- Annotations

The data is cached locally in memory and appended to each record.

Configuration Parameters

The plugin supports the following configuration parameters:

Key

Description

Default

Processing the 'log' value

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On

Since Fluent Bit v1.2 we are not suggesting the use of decoders (Decode_Field_As) if you are using Elasticsearch database in the output to avoid data type conflicts.

To perform processing of the log key, it's mandatory to enable the Merge_Log configuration property in this filter, then the following processing order will be done:

If a Pod suggest a parser, the filter will use that parser to process the content of log.
If the option Merge_Parser was set and the Pod did not suggest a parser, process the log content using the suggested parser in the configuration.
If no Pod was suggested and no Merge_Parser is set, try to handle the content as JSON.

If log value processing fails, the value is untouched. The order above is not chained, meaning it's exclusive and the filter will try only one of the options above, not all of them.

Kubernetes Annotations

A flexible feature of Fluent Bit Kubernetes filter is that allow Kubernetes Pods to suggest certain behaviors for the log processor pipeline when processing the records. At the moment it support:

Suggest a pre-defined parser
Request to exclude logs

The following annotations are available:

Annotation

Description

Default

Annotation Examples in Pod definition

Suggest a parser

The following Pod definition runs a Pod that emits Apache logs to the standard output, in the Annotations it suggest that the data should be processed using the pre-defined parser called apache:

apiVersion: v1
kind: Pod
metadata:
  name: apache-logs
  labels:
    app: apache-logs
  annotations:
    fluentbit.io/parser: apache
spec:
  containers:
  - name: apache
    image: edsiper/apache_logs

Request to exclude logs

There are certain situations where the user would like to request that the log processor simply skip the logs from the Pod in question:

apiVersion: v1
kind: Pod
metadata:
  name: apache-logs
  labels:
    app: apache-logs
  annotations:
    fluentbit.io/exclude: "true"
spec:
  containers:
  - name: apache
    image: edsiper/apache_logs

Note that the annotation value is boolean which can take a true or false and must be quoted.

Workflow of Tail + Kubernetes Filter

[INPUT]
    Name    tail
    Tag     kube.*
    Path    /var/log/containers/*.log
    Parser  docker

[FILTER]
    Name             kubernetes
    Match            kube.*
    Kube_URL         https://kubernetes.default.svc:443
    Kube_CA_File     /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File  /var/run/secrets/kubernetes.io/serviceaccount/token
    Kube_Tag_Prefix  kube.var.log.containers.
    Merge_Log        On
    Merge_Log_Key    log_processed

Tail support Tags expansion, which means that if a tag have a star character (*), it will replace the value with the absolute path of the monitored file, so if you file name and path is:

/var/log/container/apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

then the Tag for every record of that file becomes:

kube.var.log.containers.apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

note that slashes are replaced with dots.

If you have large pod specifications (can be caused by large numbers of environment variables, etc.), be sure to increase the Buffer_Size parameter of the kubernetes filter. If object sizes exceed this buffer, some metadata will fail to be injected to the logs.

kube.var.log.containers.apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

to:

apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

the transformation above do not modify the original Tag, just creates a new representation for the filter to perform metadata lookup.

that new value is used by the filter to lookup the pod name and namespace, for that purpose it uses an internal Regular expression:

(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$

If you want to know more details, check the source code of that definition here.

You can see on Rublar.com web site how this operation is performed, check the following demo link:

https://rubular.com/r/HZz3tYAahj6JCd

Custom Regex

Under certain and not common conditions, a user would want to alter that hard-coded regular expression, for that purpose the option Regex_Parser can be used (documented on top).

Final Comments

Optional Feature: Using Kubelet to Get Metadata

Configuration Setup

There are some configuration setup needed for this feature.

Role Configuration for Fluent Bit DaemonSet Example:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentbitds
  namespace: fluentbit-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: fluentbit
rules:
  - apiGroups: [""]
    resources:
      - namespaces
      - pods
      - nodes
      - nodes/proxy
    verbs: 
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: fluentbit
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluentbit
subjects:
  - kind: ServiceAccount
    name: fluentbitds
    namespace: fluentbit-system

Fluent Bit Configuration Example:

    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        DB                /var/log/flb_kube.db
        Parser            docker
        Docker_Mode       On
        Mem_Buf_Limit     50MB
        Skip_Long_Lines   On
        Refresh_Interval  10

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        Buffer_Size         0
        Use_Kubelet         true
        Kubelet_Port        10250

So for fluent bit configuration, you need to set the Use_Kubelet to true to enable this feature.

DaemonSet config Example:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentbit
  namespace: fluentbit-system
  labels:
    app.kubernetes.io/name: fluentbit
spec:
  selector:
    matchLabels:
      name: fluentbit
  template:
    metadata:
      labels:
        name: fluentbit
    spec:
      serviceAccountName: fluentbitds
      containers:
        - name: fluent-bit
          imagePullPolicy: Always
          image: fluent/fluent-bit:latest
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluentbit-config
              mountPath: /fluent-bit/etc/
          resources:
            limits:
              memory: 1500Mi
            requests:
              cpu: 500m
              memory: 500Mi
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: fluentbit-config
          configMap:
            name: fluentbit-config

The key point is to set hostNetwork to true and dnsPolicy to ClusterFirstWithHostNet that fluent bit DaemonSet could call Kubelet locally. Otherwise it could not resolve the dns for kubelet.

Now you are good to use this new feature!

Verify that the Use_Kubelet option is working

Basically you should see no difference about your experience for enriching your log files with Kubernetes metadata.

To check if Fluent Bit is using the kubelet, you can check fluent bit logs and there should be a log like this:

[ info] [filter:kubernetes:kubernetes.0] testing connectivity with Kubelet...

And if you are in debug mode, you could see more:

[debug] [filter:kubernetes:kubernetes.0] Send out request to Kubelet for pods information.
[debug] [filter:kubernetes:kubernetes.0] Request (ns=<namespace>, pod=node name) http_do=0, HTTP Status: 200
[ info] [filter:kubernetes:kubernetes.0] connectivity OK
[2021/02/05 10:33:35] [debug] [filter:kubernetes:kubernetes.0] Request (ns=<Namespace>, pod=<podName>) http_do=0, HTTP Status: 200
[2021/02/05 10:33:35] [debug] [filter:kubernetes:kubernetes.0] kubelet find pod: <podName> and ns: <Namespace> match

Troubleshooting

The following section goes over specific log messages you may run into and how to solve them to ensure that Fluent Bit's Kubernetes filter is operating properly

I can't see metadata appended to my pod or other Kubernetes objects

If you are not seeing metadata added to your kubernetes logs and see the following in your log message, then you may be facing connectivity issues with the Kubernetes API server.

[2020/10/15 03:48:57] [ info] [filter_kube] testing connectivity with API server...
[2020/10/15 03:48:57] [error] [filter_kube] upstream connection error
[2020/10/15 03:48:57] [ warn] [filter_kube] could not get meta for POD

Potential fix #1: Check Kubernetes roles

You can test this by running the following command (replace fluentbit-system with the namespace where your fluentbit is installed)

kubectl auth can-i list pods --as=system:serviceaccount:fluentbit-system:fluentbit

If set roles are configured correctly, it should simply respond with yes.

For instance, using Azure AKS, running the above command may respond with:

no - Azure does not have opinion for this user.

appended to subjects array:

- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts

Potential fix #2: Check Kubernetes IPv6

There may be cases where you have IPv6 on in the environment and you need to enable this within Fluent Bit. Under the service tag please set the following option ipv6 to on .

Potential fix #3: Check connectivity to Kube_URL