Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Modify Filter plugin allows you to change records using rules and conditions.
As an example using JSON notation to,
Rename Key2
to RenamedKey
Add a key OtherKey
with value Value3
if OtherKey
does not yet exist
Example (input)
Example (output)
The plugin supports the following rules:
Rules are case insensitive, parameters are not
Any number of rules can be set in a filter instance.
Rules are applied in the order they appear, with each rule operating on the result of the previous rule.
The plugin supports the following conditions:
Conditions are case insensitive, parameters are not
Any number of conditions can be set.
Conditions apply to the whole filter instance and all its rules. Not to individual rules.
All conditions have to be true
for the rules to be applied.
You can set Record Accessor as STRING:KEY
for nested key.
In order to start filtering records, you can run the filter from the command line or through the configuration file. The following invokes the Memory Usage Input Plugin, which outputs the following (example),
Note: Using the command line mode requires quotes parse the wildcard properly. The use of a configuration file is recommended.
The output of both the command line and configuration invocations should be identical and result in the following output.
The Nest Filter plugin allows you to operate on or with nested data. Its modes of operation are
nest
- Take a set of records and place them in a map
lift
- Take a map by key and lift its records up
As an example using JSON notation, to nest keys matching the Wildcard
value Key*
under a new key NestKey
the transformation becomes,
Example (input)
Example (output)
As an example using JSON notation, to lift keys nested under the Nested_under
value NestKey*
the transformation becomes,
Example (input)
Example (output)
The plugin supports the following configuration parameters:
In order to start filtering records, you can run the filter from the command line or through the configuration file. The following invokes the Memory Usage Input Plugin, which outputs the following (example),
Note: Using the command line mode requires quotes parse the wildcard properly. The use of a configuration file is recommended.
The following command will load the mem plugin. Then the nest filter will match the wildcard rule to the keys and nest the keys matching Mem.*
under the new key NEST
.
The output of both the command line and configuration invocations should be identical and result in the following output.
This example nests all Mem.*
and Swap,*
items under the Stats
key and then reverses these actions with a lift
operation. The output appears unchanged.
This example takes the keys starting with Mem.*
and nests them under LAYER1
, which itself is then nested under LAYER2
, which is nested under LAYER3
.
This example starts with the 3-level deep nesting of Example 2 and applies the lift
filter three times to reverse the operations. The end result is that all records are at the top level, without nesting, again. One prefix is added for each level that is lifted.
The AWS Filter Enriches logs with AWS Metadata. Currently the plugin adds the EC2 instance ID and availability zone to log records. To use this plugin, you must be running in EC2 and have the instance metadata service enabled.
The plugin supports the following configuration parameters:
Key | Description | Default |
---|---|---|
Note: If you run Fluent Bit in a container, you may have to use instance metadata v1. The plugin behaves the same regardless of which version is used.
The ECS Filter Enriches logs with AWS Elastic Container Service Metadata. The plugin can enrich logs with task, cluster and container metadata. The plugin uses the ECS Agent introspection API to obtain metadata. This filter only works with the ECS EC2 launch type. The filter only works when Fluent Bit is running on an ECS EC2 Container Instance and has access to the ECS Agent introspection API. The filter is not supported on ECS Fargate. To obtain metadata on ECS Fargate, use the built-in FireLens metadata or the AWS for Fluent Bit init project.
The plugin supports the following configuration parameters:
Key | Description | Default |
---|---|---|
The following template variables can be used for values with the Add
option. See the tutorial below for examples.
Variable | Description | Supported with Cluster_Metadata_Only On |
---|---|---|
The output log should be similar to:
The output log would be similar to:
Notice that the template variables in the value for the resource
key are separated by dot characters. Please see the section below about limitations in which characters can be used to separate template variables.
This examples shows a use case for the Cluster_Metadata_Only
option- attaching cluster metadata to ECS Agent logs.
Notice in example 2, that the template values are separated by dot characters. This is important; the Fluent Bit record_accessor library has a limitation in the characters that can separate template variables- only dots and commas (.
and ,
) can come after a template variable. This is because the templating library must parse the template and determine the end of a variable.
The following would be invalid templates because the two template variables are not separated by commas or dots:
$TaskID-$ECSContainerName
$TaskID/$ECSContainerName
$TaskID_$ECSContainerName
$TaskIDfooo$ECSContainerName
However, the following are valid:
$TaskID.$ECSContainerName
$TaskID.ecs_resource.$ECSContainerName
$TaskID.fooo.$ECSContainerName
And the following are valid since they only contain one template variable with nothing after it:
fooo$TaskID
fooo____$TaskID
fooo/bar$TaskID
The following plugin looks up if a value in a specified list exists and then allows the addition of a record to indicate if found. Introduced in version 1.8.4
The plugin supports the following configuration parameters
Key | Description |
---|---|
In the following configuration we will read a file test1.log
that includes the following values
Additionally, we will use the following lookup file which contains a list of malicious IPs (ip_list.txt
)
In the configuration we are using $remote_addr as the lookup key and 7.7.7.7 is malicious. This means the record we would output for the last record would look like the following
Made for testing: make sure that your records contain the expected key and values
The expect filter plugin allows you to validate that records match certain criteria in their structure, like validating that a key exists or it has a specific value.
The following page just describes the configuration properties available, for a detailed explanation of its usage and use cases, please refer the following page:
The plugin supports the following configuration parameters:
Property | Description |
---|---|
As mentioned on top, refer to the following page for specific details of usage of this filter:
Select or exclude records per patterns
The Grep Filter plugin allows you to match or exclude specific records based on regular expression patterns for values or nested values.
The plugin supports the following configuration parameters:
Key | Value Format | Description |
---|---|---|
This plugin enables the Record Accessor feature to specify the KEY. Using the record accessor is suggested if you want to match values against nested values.
In order to start filtering records, you can run the filter from the command line or through the configuration file. The following example assumes that you have a file called lines.txt
with the following content:
Note: using the command line mode need special attention to quote the regular expressions properly. It's suggested to use a configuration file.
The following command will load the tail plugin and read the content of lines.txt
file. Then the grep filter will apply a regular expression rule over the log field (created by tail plugin) and only pass the records which field value starts with aa:
The filter allows to use multiple rules which are applied in order, you can have many Regex and Exclude entries as required.
If you want to match or exclude records based on nested values, you can use a Record Accessor format as the KEY name. Consider the following record example:
if you want to exclude records that match given nested field (for example kubernetes.labels.app
), you can use the following rule:
It may be that in your processing pipeline you want to drop records that are missing certain keys.
A simple way to do this is just to exclude
with a regex that matches anything, a missing key will fail this check.
Here is an example that checks for a specific valid value for the key as well:
The specified key iot_timestamp
must match the expected expression - if it does not or is missing/empty then it will be excluded.
Look up Geo data from IP
GeoIP2 Filter allows you to enrich the incoming data stream using location data from GeoIP2 database.
This plugin supports the following configuration parameters:
Key | Description |
---|---|
The following configuration will process incoming remote_addr
, and append country information retrieved from GeoLite2 database.
Each Record
parameter above specifies the following triplet:
The field name to be added to records (country
)
The lookup key to process (remote_addr
)
The query for GeoIP2 database (%{country.names.en}
)
By running Fluent Bit with the configuration above, you will see the following output:
Note that the GeoLite2-City.mmdb
database is available from MaxMind's official site.
The Nightfall filter scans logs for sensitive data and redacts the sensitive portions. This filter supports scanning for various sensitive information, ranging from API keys and personally identifiable information(PII) to custom regexes you define. You can configure what to scan for in the Nightfall Dashboard.
This filter is not enabled by default in 1.9.0 due to a typo. It must be enabled by setting flag
-DFLB_FILTER_NIGHTFALL=ON
when building. In 1.9.1 and above this is fixed.
The plugin supports the following configuration parameters:
Key | Description | Default |
---|---|---|
Powerful and flexible routing
Tags are what makes routing possible. Tags are set in the configuration of the Input definitions where the records are generated, but there are certain scenarios where might be useful to modify the Tag in the pipeline so we can perform more advanced and flexible routing.
The rewrite_tag
filter, allows to re-emit a record under a new Tag. Once a record has been re-emitted, the original record can be preserved or discarded.
The way it works is defining rules that matches specific record key content against a regular expression, if a match exists, a new record with the defined Tag will be emitted, entering from the beginning of the pipeline. Multiple rules can be specified and they are processed in order until one of them matches.
The new Tag to define can be composed by:
Alphabet characters & Numbers
Original Tag string or part of it
Regular Expressions groups capture
Any key or sub-key of the processed record
Environment variables
The rewrite_tag
filter supports the following configuration parameters:
Key | Description |
---|---|
A rule aims to define matching criteria and specify how to create a new Tag for a record. You can define one or multiple rules in the same configuration section. The rules have the following format:
The key represents the name of the record key that holds the value that we want to use to match our regular expression. A key name is specified and prefixed with a $
. Consider the following structured record (formatted for readability):
If we wanted to match against the value of the key name
we must use $name
. The key selector is flexible enough to allow to match nested levels of sub-maps from the structure. If we wanted to check the value of the nested key s2
we can do it specifying $ss['s1']['s2']
, for short:
$name
= "abc-123"
$ss['s1']['s2']
= "flb"
Note that a key must point a value that contains a string, it's not valid for numbers, booleans, maps or arrays.
Using a simple regular expression we can specify a matching pattern to use against the value of the key specified above, also we can take advantage of group capturing to create custom placeholder values.
If we wanted to match any record that it $name
contains a value of the format string-number
like the example provided above, we might use:
Note that in our example we are using parentheses, this teams that we are specifying groups of data. If the pattern matches the value a placeholder will be created that can be consumed by the NEW_TAG section.
If $name
equals abc-123
, then the following placeholders will be created:
$0
= "abc-123"
$1
= "abc"
$2
= "123"
If the Regular expression do not matches an incoming record, the rule will be skipped and the next rule (if any) will be processed.
If a regular expression has matched the value of the defined key in the rule, we are ready to compose a new Tag for that specific record. The tag is a concatenated string that can contain any of the following characters: a-z
,A-Z
, 0-9
and .-,
.
A Tag can take any string value from the matching record, the original tag it self, environment variable or general placeholder.
Consider the following incoming data on the rule:
Tag = aa.bb.cc
Record = {"name": "abc-123", "ss": {"s1": {"s2": "flb"}}}
Environment variable $HOSTNAME = fluent
With such information we could create a very custom Tag for our record like the following:
the expected Tag to generated will be:
We make use of placeholders, record content and environment variables.
If a rule matches a rule the filter will emit a copy of the record with the new defined Tag. The property keep takes a boolean value to define if the original record with the old Tag must be preserved and continue in the pipeline or just be discarded.
You can use true
or false
to decide the expected behavior. There is no default value and this is a mandatory field in the rule.
The following configuration example will emit a dummy (hand-crafted) record, the filter will rewrite the tag, discard the old record and print the new record to the standard output interface:
The original tag test_tag
will be rewritten as from.test_tag.new.fluent.bit.out
:
As described in the Monitoring section, every component of the pipeline of Fluent Bit exposes metrics. The basic metrics exposed by this filter are drop_records
and add_records
, they summarize the total of dropped records from the incoming data chunk or the new records added.
Since rewrite_tag
emit new records that goes through the beginning of the pipeline, it exposes an additional metric called emit_records
that summarize the total number of emitted records.
Using the configuration provided above, if we query the metrics exposed in the HTTP interface we will see the following:
Command:
Metrics output:
The dummy input generated two records, the filter dropped two from the chunks and emitted two new ones under a different Tag.
The records generated are handled by the internal Emitter, so the new records are summarized in the Emitter metrics, take a look at the entry called emitter_for_rewrite_tag.0
.
The Emitter is an internal Fluent Bit plugin that allows other components of the pipeline to emit custom records. On this case rewrite_tag
creates an Emitter instance to use it exclusively to emit records, on that way we can have a granular control of who is emitting what.
The Emitter name in the metrics can be changed setting up the Emitter_Name
configuration property described above.
The Parser Filter plugin allows for parsing fields in event records.
The plugin supports the following configuration parameters:
Key | Description | Default |
---|---|---|
This is an example of parsing a record {"data":"100 0.5 true This is example"}
.
The plugin needs a parser file which defines how to parse each field.
The path of the parser file should be written in configuration file under the [SERVICE] section.
The output is
You can see the records {"data":"100 0.5 true This is example"}
are parsed.
By default, the parser plugin only keeps the parsed fields in its output.
If you enable Reserve_Data
, all other fields are preserved:
This will produce the output:
If you enable Reserved_Data
and Preserve_Key
, the original key field will be preserved as well:
This will produce the following output:
Tensorflow Filter allows running Machine Learning inference tasks on the records of data coming from input plugins or stream processor. This filter uses Tensorflow Lite as the inference engine, and requires Tensorflow Lite shared library to be present during build and at runtime.
Tensorflow Lite is a lightweight open-source deep learning framework that is used for mobile and IoT applications. Tensorflow Lite only handles inference (not training), therefore, it loads pre-trained models (.tflite
files) that are converted into Tensorflow Lite format (FlatBuffer
). You can read more on converting Tensorflow models here
The plugin supports the following configuration parameters:
Key | Description | Default |
---|---|---|
Clone Tensorflow repository, install bazel package manager, and run the following command in order to create the shared library:
The script creates the shared library bazel-bin/tensorflow/lite/c/libtensorflowlite_c.so
. You need to copy the library to a location (such as /usr/lib
) that can be used by Fluent Bit.
Tensorflow filter plugin is disabled by default. You need to build Fluent Bit with Tensorflow plugin enabled. In addition, it requires access to Tensorflow Lite header files to compile. Therefore, you also need to pass the address of the Tensorflow source code on your machine to the build script:
If Tensorflow plugin initializes correctly, it reports successful creation of the interpreter, and prints a summary of model's input/output types and dimensions.
Currently supports single-input models
Uses Tensorflow 2.3 header files
Use Wasm programs as a filter
Wasm Filter allows you to modify the incoming records using Wasm technology.
Due to the necessity to have a flexible filtering mechanism, it is now possible to extend Fluent Bit capabilities by writing custom filters using built Wasm programs and its runtime. A Wasm-based filter takes two steps:
(Optional) Compiled as AOT (Ahead Of Time) objects to optimize Wasm execution pipeline
Configure the Filter in the main configuration
Prepare a Wasm program that will be used by the Filter
The plugin supports the following configuration parameters:
Key | Description |
---|---|
Here is a configuration example.
Fluent Bit Kubernetes Filter allows to enrich your log files with Kubernetes metadata.
When Fluent Bit is deployed in Kubernetes as a DaemonSet and configured to read the log files from the containers (using tail or systemd input plugins), this filter aims to perform the following operations:
Analyze the Tag and extract the following metadata:
Pod Name
Namespace
Container Name
Container ID
Query Kubernetes API Server to obtain extra metadata for the POD in question:
Pod ID
Labels
Annotations
The data is cached locally in memory and appended to each record.
The plugin supports the following configuration parameters:
Key | Description | Default |
---|---|---|
Kubernetes Filter aims to provide several ways to process the data contained in the log key. The following explanation of the workflow assumes that your original Docker parser defined in parsers.conf is as follows:
Since Fluent Bit v1.2 we are not suggesting the use of decoders (Decode_Field_As) if you are using Elasticsearch database in the output to avoid data type conflicts.
To perform processing of the log key, it's mandatory to enable the Merge_Log configuration property in this filter, then the following processing order will be done:
If a Pod suggest a parser, the filter will use that parser to process the content of log.
If the option Merge_Parser was set and the Pod did not suggest a parser, process the log content using the suggested parser in the configuration.
If no Pod was suggested and no Merge_Parser is set, try to handle the content as JSON.
If log value processing fails, the value is untouched. The order above is not chained, meaning it's exclusive and the filter will try only one of the options above, not all of them.
A flexible feature of Fluent Bit Kubernetes filter is that allow Kubernetes Pods to suggest certain behaviors for the log processor pipeline when processing the records. At the moment it support:
Suggest a pre-defined parser
Request to exclude logs
The following annotations are available:
The following Pod definition runs a Pod that emits Apache logs to the standard output, in the Annotations it suggest that the data should be processed using the pre-defined parser called apache:
There are certain situations where the user would like to request that the log processor simply skip the logs from the Pod in question:
Note that the annotation value is boolean which can take a true or false and must be quoted.
Kubernetes Filter depends on either Tail or Systemd input plugins to process and enrich records with Kubernetes metadata. Here we will explain the workflow of Tail and how it configuration is correlated with Kubernetes filter. Consider the following configuration example (just for demo purposes, not production):
In the input section, the Tail plugin will monitor all files ending in .log in path /var/log/containers/. For every file it will read every line and apply the docker parser. Then the records are emitted to the next step with an expanded tag.
Tail support Tags expansion, which means that if a tag have a star character (*), it will replace the value with the absolute path of the monitored file, so if you file name and path is:
then the Tag for every record of that file becomes:
note that slashes are replaced with dots.
When Kubernetes Filter runs, it will try to match all records that starts with kube. (note the ending dot), so records from the file mentioned above will hit the matching rule and the filter will try to enrich the records
Kubernetes Filter do not care from where the logs comes from, but it cares about the absolute name of the monitored file, because that information contains the pod name and namespace name that are used to retrieve associated metadata to the running Pod from the Kubernetes Master/API Server.
If you have large pod specifications (can be caused by large numbers of environment variables, etc.), be sure to increase the
Buffer_Size
parameter of the kubernetes filter. If object sizes exceed this buffer, some metadata will fail to be injected to the logs.
If the configuration property Kube_Tag_Prefix was configured (available on Fluent Bit >= 1.1.x), it will use that value to remove the prefix that was appended to the Tag in the previous Input section. Note that the configuration property defaults to kube.var.logs.containers. , so the previous Tag content will be transformed from:
to:
the transformation above do not modify the original Tag, just creates a new representation for the filter to perform metadata lookup.
that new value is used by the filter to lookup the pod name and namespace, for that purpose it uses an internal Regular expression:
If you want to know more details, check the source code of that definition here.
You can see on Rublar.com web site how this operation is performed, check the following demo link:
Under certain and not common conditions, a user would want to alter that hard-coded regular expression, for that purpose the option Regex_Parser can be used (documented on top).
So at this point the filter is able to gather the values of pod_name and namespace, with that information it will check in the local cache (internal hash table) if some metadata for that key pair exists, if so, it will enrich the record with the metadata value, otherwise it will connect to the Kubernetes Master/API Server and retrieve that information.
There is an issue reported about kube-apiserver fall over and become unresponsive when cluster is too large and too many requests are sent to it. For this feature, fluent bit Kubernetes filter will send the request to kubelet /pods endpoint instead of kube-apiserver to retrieve the pods information and use it to enrich the log. Since Kubelet is running locally in nodes, the request would be responded faster and each node would only get one request one time. This could save kube-apiserver power to handle other requests. When this feature is enabled, you should see no difference in the kubernetes metadata added to logs, but the Kube-apiserver bottleneck should be avoided when cluster is large.
There are some configuration setup needed for this feature.
Role Configuration for Fluent Bit DaemonSet Example:
The difference is that kubelet need a special permission for resource nodes/proxy
to get HTTP request in. When creating the role
or clusterRole
, you need to add nodes/proxy
into the rule for resource.
Fluent Bit Configuration Example:
So for fluent bit configuration, you need to set the Use_Kubelet
to true to enable this feature.
DaemonSet config Example:
The key point is to set hostNetwork
to true
and dnsPolicy
to ClusterFirstWithHostNet
that fluent bit DaemonSet could call Kubelet locally. Otherwise it could not resolve the dns for kubelet.
Now you are good to use this new feature!
Basically you should see no difference about your experience for enriching your log files with Kubernetes metadata.
To check if Fluent Bit is using the kubelet, you can check fluent bit logs and there should be a log like this:
And if you are in debug mode, you could see more:
The following section goes over specific log messages you may run into and how to solve them to ensure that Fluent Bit's Kubernetes filter is operating properly
If you are not seeing metadata added to your kubernetes logs and see the following in your log message, then you may be facing connectivity issues with the Kubernetes API server.
Potential fix #1: Check Kubernetes roles
When Fluent Bit is deployed as a DaemonSet it generally runs with specific roles that allow the application to talk to the Kubernetes API server. If you are deployed in a more restricted environment check that all the Kubernetes roles are set correctly.
You can test this by running the following command (replace fluentbit-system
with the namespace where your fluentbit is installed)
If set roles are configured correctly, it should simply respond with yes
.
For instance, using Azure AKS, running the above command may respond with:
If you have connectivity to the API server, but still "could not get meta for POD" - debug logging might give you a message with Azure does not have opinion for this user
. Then the following subject
may need to be included in the fluentbit
ClusterRoleBinding
:
appended to subjects
array:
Potential fix #2: Check Kubernetes IPv6
There may be cases where you have IPv6 on in the environment and you need to enable this within Fluent Bit. Under the service tag please set the following option ipv6
to on
.
Potential fix #3: Check connectivity to Kube_URL
By default the Kube_URL is set to https://kubernetes.default.svc:443
. Ensure that you have connectivity to this endpoint from within the cluster and that there are no special permission interfering with the connection.
In some cases, you may only see some objects being appended with metadata while other objects are not enriched. This can occur at times when local data is cached and does not contain the correct id for the kubernetes object that requires enrichment. For most Kubernetes objects the Kubernetes API server is updated which will then be reflected in Fluent Bit logs, however in some cases for Pod
objects this refresh to the Kubernetes API server can be skipped, causing metadata to be skipped.
Concatenate Multiline or Stack trace log messages. Available on Fluent Bit >= v1.8.2.
The Multiline Filter helps to concatenate messages that originally belong to one context but were split across multiple records or log lines. Common examples are stack traces or applications that print logs in multiple lines.
As part of the built-in functionality, without major configuration effort, you can enable one of ours built-in parsers with auto detection and multi format support:
go
python
ruby
java (Google Cloud Platform Java stacktrace format)
Some comments about this filter:
The usage of this filter depends on a previous configuration of a Multiline Parser definition.
If you wish to concatenate messages read from a log file, it is highly recommended to use the multiline support in the Tail plugin itself. This is because performing concatenation while reading the log file is more performant. Concatenating messages originally split by Docker or CRI container engines, is supported in the Tail plugin.
This filter only performs buffering that persists across different Chunks when Buffer
is enabled. Otherwise, the filter will process one Chunk at a time and is not suitable for most inputs which might send multiline messages in separate chunks.
When buffering is enabled, the filter does not immediately emit messages it receives. It uses the in_emitter plugin, same as the Rewrite Tag Filter, and emits messages once they are fully concatenated, or a timeout is reached.
Since concatenated records are re-emitted to the head of the Fluent Bit log pipeline, you can not configure multiple multiline filter definitions that match the same tags. This will cause an infinite loop in the Fluent Bit pipeline; to use multiple parsers on the same logs, configure a single filter definitions with a comma separated list of parsers for multiline.parser
. For more, see issue #5235.
Secondly, for the same reason, the multiline filter should be the first filter. Logs will be re-emitted by the multiline filter to the head of the pipeline- the filter will ignore its own re-emitted records, but other filters won't. If there are filters before the multiline filter, they will be applied twice.
The plugin supports the following configuration parameters:
Property | Description |
---|---|
The following example aims to parse a log file called test.log
that contains some full lines, a custom Java stacktrace and a Go stacktrace.
The following example files can be located at: https://github.com/fluent/fluent-bit/tree/master/documentation/examples/multiline/filter_multiline
Example files content:
This is the primary Fluent Bit configuration file. It includes the parsers_multiline.conf
and tails the file test.log
by applying the multiline parsers multiline-regex-test
and go
. Then it sends the processing to the standard output.
This second file defines a multiline parser for the example. Note that a second multiline parser called go
is used in fluent-bit.conf, but this one is a built-in parser.
An example file with multiline and multiformat content:
By running Fluent Bit with the given configuration file you will obtain:
The lines that did not match a pattern are not considered as part of the multiline message, while the ones that matched the rules were concatenated properly.
When Fluent Bit is consuming logs from a container runtime, such as docker, these logs will be split above a certain limit, usually 16KB. If your application emits a 100K log line, it will be split into 7 partial messages. If you are using the Fluentd Docker Log Driver to send the logs to Fluent Bit, they might look like this:
Fluent Bit can re-combine these logs that were split by the runtime and remove the partial message fields. The filter example below is for this use case.
The two options for mode
are mutually exclusive in the filter. If you set the mode
to partial_message
then the multiline.parser
option is not allowed.
The Throttle Filter plugin sets the average Rate of messages per Interval, based on leaky bucket and sliding window algorithm. In case of overflood, it will leak within certain rate.
The plugin supports the following configuration parameters:
Key | Value Format | Description |
---|---|---|
Lets imagine we have configured:
we received 1 message first second, 3 messages 2nd, and 5 3rd. As you can see, disregard that Window is actually 5, we use "slow" start to prevent overflooding during the startup.
But as soon as we reached Window size * Interval, we will have true sliding window with aggregation over complete window.
When we have average over window is more than Rate, we will start dropping messages, so that
will become:
As you can see, last pane of the window was overwritten and 1 message was dropped.
You might noticed possibility to configure Interval of the Window shift. It is counter intuitive, but there is a difference between two examples above:
and
Even though both examples will allow maximum Rate of 60 messages per minute, first example may get all 60 messages within first second, and will drop all the rest for the entire minute:
While the second example will not allow more than 1 message per second every second, making output rate more smooth:
It may drop some data if the rate is ragged. I would recommend to use bigger interval and rate for streams of rare but important events, while keep Window bigger and Interval small for constantly intensive inputs.
Note: It's suggested to use a configuration file.
The following command will load the tail plugin and read the content of lines.txt file. Then the throttle filter will apply a rate limit and only pass the records which are read below the certain rate:
The example above will pass 1000 messages per second in average over 300 seconds.
The stdout filter plugin allows printing to the standard output the data flowed through the filter plugin, which can be very useful while debugging.
The plugin has no configuration parameters, is very simple to use.
We have specified to gather usage metrics and print them out in a human-readable way when they flow through the stdout plugin.
The Lua filter allows you to modify the incoming records (even split one record into multiple records) using custom scripts.
Due to the necessity to have a flexible filtering mechanism, it is now possible to extend Fluent Bit capabilities by writing custom filters using Lua programming language. A Lua-based filter takes two steps:
Configure the Filter in the main configuration
Prepare a Lua script that will be used by the Filter
The plugin supports the following configuration parameters:
Key | Description |
---|
In order to test the filter, you can run the plugin from the command line or through the configuration file. The following examples use the input plugin for data ingestion, invoke Lua filter using the script and call the function which only prints the same information to the standard output:
From the command line you can use the following options:
In your main configuration file append the following Input, Filter & Output sections:
The life cycle of a filter have the following steps:
Upon Tag matching by this filter, it may process or bypass the record.
If tag matched, it will accept the record and invoke the function defined in the call
property which basically is the name of a function defined in the Lua script
.
Invoke Lua function and pass each record in JSON format.
Upon return, validate return value and continue the pipeline.
The Lua script can have one or multiple callbacks that can be used by this filter. The function prototype is as follows:
Each callback must return three values:
For functional examples of this interface, please refer to the code samples provided in the source code of the project located here:
In classic mode:
Kubernetes pods generally have various environment variables set by the infrastructure automatically which may contain useful information.
In this example, we want to extract part of the Kubernetes cluster API name.
The environment variable is set like so: KUBERNETES_SERVICE_HOST: api.sandboxbsh-a.project.domain.com
We want to extract the sandboxbsh
name and add it to our record as a special key.
+Lua treats number as double. It means an integer field (e.g. IDs, log levels) will be converted double. To avoid type conversion, The type_int_key
property is available.
The Lua callback function can return an array of tables (i.e., array of records) in its third record return value. With this feature, the Lua filter can split one input record into multiple records according to custom logic.
For example:
In this example, we want to filter istio logs to exclude lines with response codes between 1 and 399. Istio is configured to write the logs in json format.
Script response_code_filter.lua
Configuration to get istio logs and apply response code filter to them.
In the output only the messages with response code 0 or greater than 399 are shown.
The Record Modifier Filter plugin allows to append fields or to exclude specific fields.
The plugin supports the following configuration parameters: Remove_key and Allowlist_key are exclusive.
Key | Description |
---|
In order to start filtering records, you can run the filter from the command line or through the configuration file.
This is a sample in_mem record to filter.
The following configuration file is to append product name and hostname (via environment variable) to record.
You can also run the filter from command line.
The output will be
The following configuration file is to remove 'Swap.*' fields.
You can also run the filter from command line.
The output will be
The following configuration file is to remain 'Mem.*' fields.
You can also run the filter from command line.
The output will be
Operation | Parameter 1 | Parameter 2 | Description |
---|---|---|---|
Condition | Parameter | Parameter 2 | Description |
---|---|---|---|
Key | Value Format | Operation | Description |
---|---|---|---|
Annotation | Description | Default |
---|---|---|
name | description |
---|
name | data type | description |
---|
The include examples to verify during CI.
As an example that combines a bit of LUA processing with the that demonstrates using environment variables with LUA regex and substitutions.
Fluent Bit supports protected mode to prevent crash when executes invalid Lua script. See also .
See also .
imds_version
Specify which version of the instance metadata service to use. Valid values are 'v1' or 'v2'.
v2
az
The availability zone; for example, "us-east-1a".
true
ec2_instance_id
The EC2 instance ID.
true
ec2_instance_type
The EC2 instance type.
false
private_ip
The EC2 instance private ip.
false
ami_id
The EC2 instance image id.
false
account_id
The account ID for current EC2 instance.
false
hostname
The hostname for current EC2 instance.
false
vpc_id
The VPC ID for current EC2 instance.
false
Add
This parameter is similar to the ADD option in the modify filter. You can specify it any number of times and it takes two arguments, a KEY name and VALUE. The value uses Fluent Bit record_accessor syntax to create a template that uses ECS Metadata values. See the list below for supported metadata templating keys. This option is designed to give you full power to control both the key names for metadata as well as the format for metadata values. See the examples below for more.
No default
ECS_Tag_Prefix
This parameter is similar to the Kube_Tag_Prefix option in the Kubernetes filter and performs the same function. The full log tag should be prefixed with this string and after the prefix the filter must find the next characters in the tag to be the Docker Container Short ID (the first 12 characters of the full container ID). The filter uses this to identify which container the log came from so it can find which task it is a part of. See the design section below for more information. If not specified, it defaults to empty string, meaning that the tag must be prefixed with the 12 character container short ID. If you just want to attach cluster metadata to system/OS logs from processes that do not run as part of containers or ECS Tasks, then do not set this parameter and enable the Cluster_Metadata_Only option
emptry string
Cluster_Metadata_Only
When enabled, the plugin will only attempt to attach cluster metadata values. This is useful if you want to attach cluster metadata to system/OS logs from processes that do not run as part of containers or ECS Tasks.
Off
ECS_Meta_Cache_TTL
The filter builds a hash table in memory mapping each unique container short ID to its metadata. This option sets a max TTL for objects in the hash table. You should set this if you have frequent container/task restarts. For example, your cluster runs short running batch jobs that complete in less than 10 minutes, there is no reason to keep any stored metadata longer than 10 minutes. So you would set this parameter to "10m".
1h (1 hour)
$ClusterName
The ECS cluster name. Fluent Bit is running on EC2 instance(s) that are part of this cluster.
Yes
$ContainerInstanceARN
The full ARN of the ECS EC2 Container Instance. This is the instance that Fluent Bit is running on.
Yes
$ContainerInstanceID
The ID of the ECS EC2 Container Instance.
Yes
$ECSAgentVersion
The Version string of the ECS Agent that is running on the container instance.
Yes
$ECSContainerName
The name of the container from which the log originated. This is the name in your ECS Task Definition.
No
$DockerContainerName
The name of the container from which the log originated. This is the name obtained from Docker and is the name shown if you run docker ps
on the instance.
No
$ContainerID
The ID of the container from which the log originated. This is the full 64 character long container ID.
No
$TaskDefinitionFamily
The family name of the task definition for the task from which the log originated.
No
$TaskDefinitionVersion
The version/revision of the task definition for the task from which the log originated.
No
$TaskID
The ID of the ECS Task from which the log originated.
No
$TaskARN
The full ARN of the ECS Task from which the log originated.
No
file
The single value file that Fluent Bit will use as a lookup table to determine if the specified lookup_key
exists
lookup_key
The specific key to look up and determine if it exists, supports record accessor
record
The record to add if the lookup_key
is found in the specified file
. Note you may add multiple record parameters.
mode
Set the check mode. exact
and partial
are supported. Default : exact
.
print_query_time
Print to stdout the elapseed query time for every matched record. Default: false
ignore_case
Compare strings by ignoring case. Default: false
key_exists
Check if a key with a given name exists in the record.
key_not_exists
Check if a key does not exist in the record.
key_val_is_null
check that the value of the key is NULL.
key_val_is_not_null
check that the value of the key is NOT NULL.
key_val_eq
check that the value of the key equals the given value in the configuration.
action
action to take when a rule does not match. The available options are warn
, exit
or "result_key". On warn
, a warning message is sent to the logging layer when a mismatch of the rules above is found; using exit
makes Fluent Bit abort with status code 255
; result_key
is to add a matching result to each record.
result_key
specify a key name of matching result. This key is to be used only when 'action' is 'result_key'.
Regex
KEY REGEX
Keep records in which the content of KEY matches the regular expression.
Exclude
KEY REGEX
Exclude records in which the content of KEY matches the regular expression.
database
Path to the GeoIP2 database.
lookup_key
Field name to process
record
Defines the KEY LOOKUP_KEY VALUE
triplet. See below for how to set up this option.
nightfall_api_key
The Nightfall API key to scan your logs with, obtainable from the Nightfall Dashboard
policy_id
The Nightfall dev platform policy to scan your logs with, configurable in the Nightfall Dashboard.
sampling_rate
The rate controlling how much of your logs you wish to be scanned, must be a float between (0,1]. 1 means all logs will be scanned. Useful for avoiding rate limits in conjunction with Fluent Bit's match rule.
1
tls.debug
Debug level between 0 (nothing) and 4 (every detail).
0
tls.verify
When enabled, turns on certificate validation when connecting to the Nightfall API.
true
tls.ca_path
Absolute path to root certificates, required if tls.verify is true.
Set
STRING:KEY
STRING:VALUE
Add a key/value pair with key KEY
and value VALUE
. If KEY
already exists, this field is overwritten
Add
STRING:KEY
STRING:VALUE
Add a key/value pair with key KEY
and value VALUE
if KEY
does not exist
Remove
STRING:KEY
NONE
Remove a key/value pair with key KEY
if it exists
Remove_wildcard
WILDCARD:KEY
NONE
Remove all key/value pairs with key matching wildcard KEY
Remove_regex
REGEXP:KEY
NONE
Remove all key/value pairs with key matching regexp KEY
Rename
STRING:KEY
STRING:RENAMED_KEY
Rename a key/value pair with key KEY
to RENAMED_KEY
if KEY
exists AND RENAMED_KEY
does not exist
Hard_rename
STRING:KEY
STRING:RENAMED_KEY
Rename a key/value pair with key KEY
to RENAMED_KEY
if KEY
exists. If RENAMED_KEY
already exists, this field is overwritten
Copy
STRING:KEY
STRING:COPIED_KEY
Copy a key/value pair with key KEY
to COPIED_KEY
if KEY
exists AND COPIED_KEY
does not exist
Hard_copy
STRING:KEY
STRING:COPIED_KEY
Copy a key/value pair with key KEY
to COPIED_KEY
if KEY
exists. If COPIED_KEY
already exists, this field is overwritten
Move_to_start
WILDCARD:KEY
NONE
Move key/value pairs with keys matching KEY to the start of the message
Move_to_end
WILDCARD:KEY
NONE
Move key/value pairs with keys matching KEY to the end of the message
Key_exists
STRING:KEY
NONE
Is true
if KEY
exists
Key_does_not_exist
STRING:KEY
NONE
Is true
if KEY
does not exist
A_key_matches
REGEXP:KEY
NONE
Is true
if a key matches regex KEY
No_key_matches
REGEXP:KEY
NONE
Is true
if no key matches regex KEY
Key_value_equals
STRING:KEY
STRING:VALUE
Is true
if KEY
exists and its value is VALUE
Key_value_does_not_equal
STRING:KEY
STRING:VALUE
Is true
if KEY
exists and its value is not VALUE
Key_value_matches
STRING:KEY
REGEXP:VALUE
Is true
if key KEY
exists and its value matches VALUE
Key_value_does_not_match
STRING:KEY
REGEXP:VALUE
Is true
if key KEY
exists and its value does not match VALUE
Matching_keys_have_matching_values
REGEXP:KEY
REGEXP:VALUE
Is true
if all keys matching KEY
have values that match VALUE
Matching_keys_do_not_have_matching_values
REGEXP:KEY
REGEXP:VALUE
Is true
if all keys matching KEY
have values that do not match VALUE
Operation
ENUM [nest
or lift
]
Select the operation nest
or lift
Wildcard
FIELD WILDCARD
nest
Nest records which field matches the wildcard
Nest_under
FIELD STRING
nest
Nest records matching the Wildcard
under this key
Nested_under
FIELD STRING
lift
Lift records nested under the Nested_under
key
Add_prefix
FIELD STRING
ANY
Prefix affected keys with this string
Remove_prefix
FIELD STRING
ANY
Remove prefix from affected keys if it matches this string
Rule
Defines the matching criteria and the format of the Tag for the matching record. The Rule format have four components: KEY REGEX NEW_TAG KEEP
. For more specific details of the Rule format and it composition read the next section.
Emitter_Name
When the filter emits a record under the new Tag, there is an internal emitter plugin that takes care of the job. Since this emitter expose metrics as any other component of the pipeline, you can use this property to configure an optional name for it.
Emitter_Storage.type
Define a buffering mechanism for the new records created. Note these records are part of the emitter plugin. This option support the values memory
(default) or filesystem
. If the destination for the new records generated might face backpressure due to latency or slow network, we strongly recommend enabling the filesystem
mode.
Emitter_Mem_Buf_Limit
Set a limit on the amount of memory the tag rewrite emitter can consume if the outputs provide backpressure. The default for this limit is 10M
. The pipeline will pause once the buffer exceeds the value of this setting. For example, if the value is set to 10M
then the pipeline will pause if the buffer exceeds 10M
. The pipeline will remain paused until the output drains the buffer below the 10M
limit.
Key_Name
Specify field name in record to parse.
Parser
Specify the parser name to interpret the field. Multiple Parser entries are allowed (one per line).
Preserve_Key
Keep original Key_Name
field in the parsed result. If false, the field will be removed.
False
Reserve_Data
Keep all other original fields in the parsed result. If false, all other original fields will be removed.
False
input_field
Specify the name of the field in the record to apply inference on.
model_file
Path to the model file (.tflite
) to be loaded by Tensorflow Lite.
include_input_fields
Include all input filed in filter's output
True
normalization_value
Divide input values to normalization_value
Wasm_Path
Path to the built Wasm program that will be used. This can be a relative path against the main configuration file.
Function_Name
Wasm function name that will be triggered to do filtering. It's assumed that the function is built inside the Wasm program specified above.
Accessible_Paths
Specify the whilelist of paths to be able to access paths from WASM programs.
Buffer_Size
Set the buffer size for HTTP client when reading responses from Kubernetes API server. The value must be according to the Unit Size specification. A value of 0
results in no limit, and the buffer will expand as-needed. Note that if pod specifications exceed the buffer limit, the API response will be discarded when retrieving metadata, and some kubernetes metadata will fail to be injected to the logs.
32k
Kube_URL
API Server end-point
Kube_CA_File
CA certificate file
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_CA_Path
Absolute path to scan for certificate files
Kube_Token_File
Token file
/var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix
When the source records comes from Tail input plugin, this option allows to specify what's the prefix used in Tail configuration.
kube.var.log.containers.
Merge_Log
When enabled, it checks if the log
field content is a JSON string map, if so, it append the map fields as part of the log structure.
Off
Merge_Log_Key
When Merge_Log
is enabled, the filter tries to assume the log
field from the incoming message is a JSON string message and make a structured representation of it at the same level of the log
field in the map. Now if Merge_Log_Key
is set (a string name), all the new structured fields taken from the original log
content are inserted under the new key.
Merge_Log_Trim
When Merge_Log
is enabled, trim (remove possible \n or \r) field values.
On
Merge_Parser
Optional parser name to specify how to parse the data contained in the log key. Recommended use is for developers or testing only.
Keep_Log
When Keep_Log
is disabled, the log
field is removed from the incoming message once it has been successfully merged (Merge_Log
must be enabled as well).
On
tls.debug
Debug level between 0 (nothing) and 4 (every detail).
-1
tls.verify
When enabled, turns on certificate validation when connecting to the Kubernetes API server.
On
Use_Journal
When enabled, the filter reads logs coming in Journald format.
Off
Cache_Use_Docker_Id
When enabled, metadata will be fetched from K8s when docker_id is changed.
Off
Regex_Parser
Set an alternative Parser to process record Tag and extract pod_name, namespace_name, container_name and docker_id. The parser must be registered in a parsers file (refer to parser filter-kube-test as an example).
K8S-Logging.Parser
Allow Kubernetes Pods to suggest a pre-defined Parser (read more about it in Kubernetes Annotations section)
Off
K8S-Logging.Exclude
Allow Kubernetes Pods to exclude their logs from the log processor (read more about it in Kubernetes Annotations section).
Off
Labels
Include Kubernetes resource labels in the extra metadata.
On
Annotations
Include Kubernetes resource annotations in the extra metadata.
On
Kube_meta_preload_cache_dir
If set, Kubernetes meta-data can be cached/pre-loaded from files in JSON format in this directory, named as namespace-pod.meta
Dummy_Meta
If set, use dummy-meta data (for test/dev purposes)
Off
DNS_Retries
DNS lookup retries N times until the network start working
6
DNS_Wait_Time
DNS lookup interval between network status checks
30
Use_Kubelet
this is an optional feature flag to get metadata information from kubelet instead of calling Kube Server API to enhance the log. This could mitigate the Kube API heavy traffic issue for large cluster.
Off
Kubelet_Port
kubelet port using for HTTP request, this only works when Use_Kubelet
set to On.
10250
Kubelet_Host
kubelet host using for HTTP request, this only works when Use_Kubelet
set to On.
127.0.0.1
Kube_Meta_Cache_TTL
configurable TTL for K8s cached metadata. By default, it is set to 0 which means TTL for cache entries is disabled and cache entries are evicted at random when capacity is reached. In order to enable this option, you should set the number to a time interval. For example, set this value to 60 or 60s and cache entries which have been created more than 60s will be evicted.
0
Kube_Token_TTL
configurable 'time to live' for the K8s token. By default, it is set to 600 seconds. After this time, the token is reloaded from Kube_Token_File or the Kube_Token_Command.
600
Kube_Token_Command
Command to get Kubernetes authorization token. By default, it will be NULL
and we will use token file to get token. If you want to manually choose a command to get it, you can set the command here. For example, run aws-iam-authenticator -i your-cluster-name token --token-only
to set token. This option is currently Linux-only.
fluentbit.io/parser[_stream][-container]
Suggest a pre-defined parser. The parser must be registered already by Fluent Bit. This option will only be processed if Fluent Bit configuration (Kubernetes Filter) have enabled the option K8S-Logging.Parser. If present, the stream (stdout or stderr) will restrict that specific stream. If present, the container can override a specific container in a Pod.
fluentbit.io/exclude[_stream][-container]
Request to Fluent Bit to exclude or not the logs generated by the Pod. This option will only be processed if Fluent Bit configuration (Kubernetes Filter) have enabled the option K8S-Logging.Exclude.
False
multiline.parser
Specify one or multiple Multiline Parser definitions to apply to the content. You can specify multiple multiline parsers to detect different formats by separating them with a comma.
multiline.key_content
Key name that holds the content to process. Note that a Multiline Parser definition can already specify the key_content
to use, but this option allows to overwrite that value for the purpose of the filter.
mode
Mode can be parser
for regex concat, or partial_message
to concat split docker logs.
buffer
Enable buffered mode. In buffered mode, the filter can concatenate multilines from inputs that ingest records one by one (ex: Forward), rather than in chunks, re-emitting them into the beggining of the pipeline (with the same tag) using the in_emitter instance. With buffer off, this filter will not work with most inputs, except tail.
flush_ms
Flush time for pending multiline records. Defaults to 2000.
emitter_name
Name for the emitter input instance which re-emits the completed records at the beginning of the pipeline.
emitter_storage.type
The storage type for the emitter input instance. This option supports the values memory
(default) and filesystem
.
emitter_mem_buf_limit
Set a limit on the amount of memory the emitter can consume if the outputs provide backpressure. The default for this limit is 10M
. The pipeline will pause once the buffer exceeds the value of this setting. For example, if the value is set to 10M
then the pipeline will pause if the buffer exceeds 10M
. The pipeline will remain paused until the output drains the buffer below the 10M
limit.
Rate
Integer
Amount of messages for the time.
Window
Integer
Amount of intervals to calculate average over. Default 5.
Interval
String
Time interval, expressed in "sleep" format. e.g 3s, 1.5m, 0.5h etc
Print_Status
Bool
Whether to print status messages with current rate and the limits to information logs
tag | Name of the tag associated with the incoming record. |
timestamp | Unix timestamp with nanoseconds associated with the incoming record. The original format is a double (seconds.nanoseconds) |
record | Lua table with the record content |
code | integer | The code return value represents the result and further action that may follows. If code equals -1, means that the record will be dropped. If code equals 0, the record will not be modified, otherwise if code equals 1, means the original timestamp and record have been modified so it must be replaced by the returned values from timestamp (second return value) and record (third return value). If code equals 2, means the original timestamp is not modified and the record has been modified so it must be replaced by the returned values from record (third return value). The code 2 is supported from v1.4.3. |
timestamp | double | If code equals 1, the original record timestamp will be replaced with this new value. |
record | table | If code equals 1, the original record information will be replaced with this new value. Note that the record value must be a valid Lua table. This value can be an array of tables (i.e., array of objects in JSON format), and in that case the input record is effectively split into multiple records. (see below for more details) |
script | Path to the Lua script that will be used. This can be a relative path against the main configuration file. |
call | Lua function name that will be triggered to do filtering. It's assumed that the function is declared inside the script parameter defined above. |
type_int_key | If these keys are matched, the fields are converted to integer. If more than one key, delimit by space. Note that starting from Fluent Bit v1.6 integer data types are preserved and not converted to double as in previous versions. |
type_array_key | If these keys are matched, the fields are handled as array. If more than one key, delimit by space. It is useful the array can be empty. |
protected_mode | If enabled, Lua script will be executed in protected mode. It prevents Fluent Bit from crashing when invalid Lua script is executed or the triggered Lua function throws exceptions. Default is true. |
time_as_table | By default when the Lua script is invoked, the record timestamp is passed as a floating number which might lead to precision loss when it is converted back. If you desire timestamp precision, enabling this option will pass the timestamp as a Lua table with keys |
code | Inline LUA code instead of loading from a path via |
Record | Append fields. This parameter needs key and value pair. |
Remove_key | If the key is matched, that field is removed. |
Allowlist_key | If the key is not matched, that field is removed. |
Whitelist_key | An alias of |
Uuid_key | If set, the plugin appends uuid to each record. The value assigned becomes the key in the map. |