1 of 150

1.8 Fluent Bit v1.8 Documentation

High Performance Log and Metrics Processor

Fluent Bit is a Fast and Lightweight Logs and Metrics Processor and Forwarder for Linux, OSX, Windows and BSD family operating systems. It has been made with a strong focus on performance to allow the collection of events from different sources without complexity.

Features

High Performance
Data Parsing
- Convert your unstructured messages using our parsers: JSON, Regex, LTSV and Logfmt
Metrics Collection (Prometheus compatible)
Reliability and Data Integrity
- Backpressure Handling
- Data Buffering in memory and file system
Networking
- Security: built-in TLS/SSL support
- Asynchronous I/O
Pluggable Architecture and Extensibility: Inputs, Filters and Outputs
- More than 80 built-in plugins available
- Extensibility
  - Write any input, filter or output plugin in C language
  - Bonus: write Filters in Lua or Output plugins in Golang
Monitoring: expose internal metrics over HTTP in JSON and Prometheus format
Stream Processing: Perform data selection and transformation using simple SQL queries
- Create new streams of data using query results
- Aggregation Windows
- Data analysis and prediction: Timeseries forecasting
Portable: runs on Linux, MacOS, Windows and BSD systems

Fluent Bit, Fluentd and CNCF

Fluent Bit is a CNCF sub-project under the umbrella of Fluentd, it's licensed under the terms of the Apache License v2.0. This project was originally created by Treasure Data and is currently a vendor neutral and community driven project.

About

What is Fluent Bit ?

Fluent Bit is a CNCF sub-project under the umbrella of Fluentd

Fluent Bit is an open source and multi-platform log processor tool which aims to be a generic Swiss knife for logs processing and distribution.

Nowadays the number of sources of information in our environments is ever increasing. Handling data collection at scale is complex, and collecting and aggregating diverse data requires a specialized tool that can deal with:

Different sources of information
Different data formats
Data Reliability
Security
Flexible Routing
Multiple destinations

Fluent Bit has been designed with performance and low resources consumption in mind.

A Brief History of Fluent Bit

Every project has a story

On 2014, the Fluentd team at Treasure Data forecasted the need of a lightweight log processor for constraint environments like Embedded Linux and Gateways, the project aimed to be part of the Fluentd Ecosystem and we called it Fluent Bit, fully open source and available under the terms of the Apache License v2.0.

After the project was around for some time, it got some traction in the Embedded market but we also started getting requests for several features from the Cloud community like more inputs, filters, and outputs. Not so long after that, Fluent Bit becomes one of the preferred solutions to solve the logging challenges in Cloud environments.

Fluentd & Fluent Bit

The Production Grade Ecosystem

Logging and data processing in general can be complex, and at scale a bit more, that's why Fluentd was born. Fluentd has become more than a simple tool, it has grown into a fullscale ecosystem that contains SDKs for different languages and sub-projects like Fluent Bit.

On this page, we will describe the relationship between the Fluentd and Fluent Bit open source projects, as a summary we can say both are:

Licensed under the terms of Apache License v2.0
Hosted projects by the Cloud Native Computing Foundation (CNCF)
Production Grade solutions: deployed thousands of times every single day, millions per month.
Community driven projects
Widely Adopted by the Industry: trusted by all major companies like AWS, Microsoft, Google Cloud and hundred of others.
Originally created by Treasure Data.

Both projects share a lot of similarities, Fluent Bit is fully designed and built on top of the best ideas of Fluentd architecture and general design. Choosing which one to use depends on the end-user needs.

The following table describes a comparison in different areas of the projects:

Both Fluentd and Fluent Bit can work as Aggregators or Forwarders, they both can complement each other or use them as standalone solutions.

License

Strong Commitment to the Openness and Collaboration

Fluent Bit, including it core, plugins and tools are distributed under the terms of the Apache License v2.0:

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

Concepts

Key Concepts

There are a few key concepts that are really important to understand how Fluent Bit operates.

Before diving into Fluent Bit it’s good to get acquainted with some of the key concepts of the service. This document provides a gentle introduction to those concepts and common Fluent Bit terminology. We’ve provided a list below of all the terms we’ll cover, but we recommend reading this document from start to finish to gain a more general understanding of our log and stream processor.

Event or Record
Filtering
Tag
Timestamp
Match
Structured Message

Event or Record

Every incoming piece of data that belongs to a log or a metric that is retrieved by Fluent Bit is considered an Event or a Record.

As an example consider the following content of a Syslog file:

Jan 18 12:52:16 flb systemd[2222]: Starting GNOME Terminal Server
Jan 18 12:52:16 flb dbus-daemon[2243]: [session uid=1000 pid=2243] Successfully activated service 'org.gnome.Terminal'
Jan 18 12:52:16 flb systemd[2222]: Started GNOME Terminal Server.
Jan 18 12:52:16 flb gsd-media-keys[2640]: # watch_fast: "/org/gnome/terminal/legacy/" (establishing: 0, active: 0)

It contains four lines and all of them represents four independent Events.

Internally, an Event always has two components (in an array form):

[TIMESTAMP, MESSAGE]

Filtering

In some cases it is required to perform modifications on the Events content, the process to alter, enrich or drop Events is called Filtering.

There are many use cases when Filtering is required like:

Append specific information to the Event like an IP address or metadata.
Select a specific piece of the Event content.
Drop Events that matches certain pattern.

Tag

Every Event that gets into Fluent Bit gets assigned a Tag. This tag is an internal string that is used in a later stage by the Router to decide which Filter or Output phase it must go through.

Most of the tags are assigned manually in the configuration. If a tag is not specified, Fluent Bit will assign the name of the Input plugin instance from where that Event was generated from.

The only input plugin that does NOT assign tags is Forward input. This plugin speaks the Fluentd wire protocol called Forward where every Event already comes with a Tag associated. Fluent Bit will always use the incoming Tag set by the client.

A Tagged record must always have a Matching rule. To learn more about Tags and Matches check the Routing section.

Timestamp

The Timestamp represents the time when an Event was created. Every Event contains a Timestamp associated. The Timestamp is a numeric fractional integer in the format:

SECONDS.NANOSECONDS

Seconds

It is the number of seconds that have elapsed since the Unix epoch.

Nanoseconds

Fractional second or one thousand-millionth of a second.

A timestamp always exists, either set by the Input plugin or discovered through a data parsing process.

Match

Fluent Bit allows to deliver your collected and processed Events to one or multiple destinations, this is done through a routing phase. A Match represent a simple rule to select Events where it Tags matches a defined rule.

To learn more about Tags and Matches check the Routing section.

Structured Messages

Source events can have or not have a structure. A structure defines a set of keys and values inside the Event message. As an example consider the following two messages:

No structured message

"Project Fluent Bit created on 1398289291"

Structured Message

{"project": "Fluent Bit", "created": 1398289291}

At a low level both are just an array of bytes, but the Structured message defines keys and values, having a structure helps to implement faster operations on data modifications.

Fluent Bit always handles every Event message as a structured message. For performance reasons, we use a binary serialization data format called MessagePack.

Consider MessagePack as a binary version of JSON on steroids.

Buffering

Performance and Data Safety

When Fluent Bit processes data, it uses the system memory (heap) as a primary and temporal place to store the record logs before they get delivered, on this private memory area the records are processed.

Buffering refers to the ability to store the records somewhere, and while they are processed and delivered, still be able to store more. Buffering in memory is the fastest mechanism, but there are certain scenarios where the mechanism requires special strategies to deal with backpressure, data safety or reduce memory consumption by the service in constraint environments.

Network failures or latency on third party service is pretty common, and on scenarios where we cannot deliver data fast enough as we receive new data to process, we likely will face backpressure.

Our buffering strategies are designed to solve problems associated with backpressure and general delivery failures.

Fluent Bit as buffering strategies, offers a primary buffering mechanism in memory and an optional secondary one using the file system. With this hybrid solution you can adjust to any use case safety and keep a high performance while processing your data.

Both mechanisms are not exclusive and when the data is ready to be processed or delivered it will be always in memory, while other data in the queue might be in the file system until is ready to be processed and moved up to memory.

To learn more about the buffering configuration in Fluent Bit, please jump to the Buffering & Storage section.

Data Pipeline

Input

The way to gather data from your sources

Fluent Bit provides different Input Plugins to gather information from different sources, some of them just collect data from log files while others can gather metrics information from the operating system. There are many plugins for different needs.

When an input plugin is loaded, an internal instance is created. Every instance has its own and independent configuration. Configuration keys are often called properties.

Every input plugin has its own documentation section where it's specified how it can be used and what properties are available.

For more details, please refer to the Input Plugins section.

Parser

Convert Unstructured to Structured messages

Dealing with raw strings or unstructured messages is a constant pain; having a structure is highly desired. Ideally we want to set a structure to the incoming data by the Input Plugins as soon as they are collected:

The Parser allows you to convert from unstructured to structured data. As a demonstrative example consider the following Apache (HTTP Server) log entry:

192.168.2.20 - - [28/Jul/2006:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395

The above log line is a raw string without format, ideally we would like to give it a structure that can be processed later easily. If the proper configuration is used, the log entry could be converted to:

{
  "host":    "192.168.2.20",
  "user":    "-",
  "method":  "GET",
  "path":    "/cgi-bin/try/",
  "code":    "200",
  "size":    "3395",
  "referer": "",
  "agent":   ""
 }

Parsers are fully configurable and are independently and optionally handled by each input plugin, for more details please refer to the Parsers section.

Filter

Modify, Enrich or Drop your records

In production environments we want to have full control of the data we are collecting, filtering is an important feature that allows us to alter the data before delivering it to some destination.

Filtering is implemented through plugins, so each filter available could be used to match, exclude or enrich your logs with some specific metadata.

We support many filters, A common use case for filtering is Kubernetes deployments. Every Pod log needs to get the proper metadata associated

Very similar to the input plugins, Filters run in an instance context, which has its own independent configuration. Configuration keys are often called properties.

For more details about the Filters available and their usage, please refer to the Filters section.

Buffer

Data processing with reliability

Previously defined in the Buffering concept section, the buffer phase in the pipeline aims to provide a unified and persistent mechanism to store your data, either using the primary in-memory model or using the filesystem based mode.

The buffer phase already contains the data in an immutable state, meaning, no other filter can be applied.

Note that buffered data is not raw text, it's in Fluent Bit's internal binary representation.

Fluent Bit offers a buffering mechanism in the file system that acts as a backup system to avoid data loss in case of system failures.

Router

Create flexible routing rules

Routing is a core feature that allows to route your data through Filters and finally to one or multiple destinations. The router relies on the concept of Tags and Matching rules

There are two important concepts in Routing:

Tag
Match

When the data is generated by the input plugins, it comes with a Tag (most of the time the Tag is configured manually), the Tag is a human-readable indicator that helps to identify the data source.

In order to define where the data should be routed, a Match rule must be specified in the output configuration.

Consider the following configuration example that aims to deliver CPU metrics to an Elasticsearch database and Memory metrics to the standard output interface:

[INPUT]
    Name cpu
    Tag  my_cpu

[INPUT]
    Name mem
    Tag  my_mem

[OUTPUT]
    Name   es
    Match  my_cpu

[OUTPUT]
    Name   stdout
    Match  my_mem

Note: the above is a simple example demonstrating how Routing is configured.

Routing works automatically reading the Input Tags and the Output Match rules. If some data has a Tag that doesn't match upon routing time, the data is deleted.

Routing with Wildcard

Routing is flexible enough to support wildcard in the Match pattern. The below example defines a common destination for both sources of data:

[INPUT]
    Name cpu
    Tag  my_cpu

[INPUT]
    Name mem
    Tag  my_mem

[OUTPUT]
    Name   stdout
    Match  my_*

The match rule is set to my_* which means it will match any Tag that starts with my_.

Output

Destinations for your data: databases, cloud services and more!

The output interface allows us to define destinations for the data. Common destinations are remote services, local file system or standard interface with others. Outputs are implemented as plugins and there are many available.

When an output plugin is loaded, an internal instance is created. Every instance has its own independent configuration. Configuration keys are often called properties.

Every output plugin has its own documentation section specifying how it can be used and what properties are available.

For more details, please refer to the Output Plugins section.

Installation

Getting Started with Fluent Bit

The following serves as a guide on how to install/deploy/upgrade Fluent Bit

Container Deployment

Install on Linux (Packages)

Install on Windows (Packages)

Compile from Source (Linux, Windows, FreeBSD, MacOS)

Upgrade Notes

The following article cover the relevant notes for users upgrading from previous Fluent Bit versions. We aim to cover compatibility changes that you must be aware of.

Fluent Bit v1.6

If you are migrating from previous version of Fluent Bit please review the following important changes:

Tail Input Plugin

Now by default the plugin follows a file from the end once the service starts (old behavior was always read from the beginning). For every file found at start, its followed from it last position, for new files discovered at runtime or rotated, they are read from the beginning.

If you desire to keep the old behavior you can set the option read_from_head to true.

Stackdriver Output Plugin

If you have any existing queries based on the resource's project_id, please update your query accordingly.

Fluent Bit v1.5

The migration from v1.4 to v1.5 is pretty straightforward.

Fluent Bit v1.4

If you are migrating from Fluent Bit v1.3, there are no breaking changes. Just new exciting features to enjoy :)

Fluent Bit v1.3

If you are migrating from Fluent Bit v1.2 to v1.3, there are no breaking changes. If you are upgrading from an older version please review the incremental changes below.

Fluent Bit v1.2

Docker, JSON, Parsers and Decoders

On Fluent Bit v1.2 we have fixed many issues associated with JSON encoding and decoding, for hence when parsing Docker logs is no longer necessary to use decoders. The new Docker parser looks like this:

Note: again, do not use decoders.

Kubernetes Filter

We have done improvements also on how Kubernetes Filter handle the stringified log message. If the option Merge_Log is enabled, it will try to handle the log content as a JSON map, if so, it will add the keys to the root map.

In addition, we have fixed and improved the option called Merge_Log_Key. If a merge log succeed, all new keys will be packaged under the key specified by this option, a suggested configuration is as follows:

As an example, if the original log content is the following map:

the final record will be composed as follows:

Fluent Bit v1.1

If you are upgrading from Fluent Bit <= 1.0.x you should take in consideration the following relevant changes when switching to Fluent Bit v1.1 series:

Kubernetes Filter

We introduced a new configuration property called Kube_Tag_Prefix to help Tag prefix resolution and address an unexpected behavior that landed in previous versions.

During 1.0.x release cycle, a commit in Tail input plugin changed the default behavior on how the Tag was composed when using the wildcard for expansion generating breaking compatibility with other services. Consider the following configuration example:

The expected behavior is that Tag will be expanded to:

but the change introduced in 1.0 series switched from absolute path to the base file name only:

On Fluent Bit v1.1 release we restored to our default behavior and now the Tag is composed using the absolute path of the monitored file.

Having absolute path in the Tag is relevant for routing and flexible configuration where it also helps to keep compatibility with Fluentd behavior.

This behavior switch in Tail input plugin affects how Filter Kubernetes operates. As you know when the filter is used it needs to perform local metadata lookup that comes from the file names when using Tail as a source. Now with the new Kube_Tag_Prefix option you can specify what's the prefix used in Tail input plugin, for the configuration example above the new configuration will look as follows:

So the proper for Kube_Tag_Prefix value must be composed by Tag prefix set in Tail input plugin plus the converted monitored directory replacing slashes with dots.

Supported Platforms

The following operating systems and architectures are supported in Fluent Bit.

From an architecture support perspective, Fluent Bit is fully functional on x86_64, Arm64v8 and Arm32v7 based processors.

Requirements

Fluent Bit uses very low CPU and Memory consumption, it's compatible with most of x86, x86_64, arm32v7 and arm64v8 based platforms. In order to build it you need the following components in your system for the build process:

Compiler: GCC or clang
CMake
Flex & Bison: only if you enable the Stream Processor or Record Accessor feature (both enabled by default)

In the core there are not other dependencies, For certain features that depends on third party components like output plugins with special backend libraries (e.g: kafka), those are included in the main source code repository.

Sources

Build and Install

Fluent Bit uses CMake as it build system. The suggested procedure to prepare the build system consists of the following steps:

Prepare environment

In the following steps you can find exact commands to build and install the project with the default options. If you already know how CMake works you can skip this part and look at the build options available. Note that Fluent Bit requires CMake 3.x. You may need to use cmake3 instead of cmake to complete the following steps on your system.

Change to the build/ directory inside the Fluent Bit sources:

$ cd build/

Let CMake configure the project specifying where the root path is located:

$ cmake ../
-- The C compiler identification is GNU 4.9.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- The CXX compiler identification is GNU 4.9.2
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
...
-- Could NOT find Doxygen (missing:  DOXYGEN_EXECUTABLE)
-- Looking for accept4
-- Looking for accept4 - not found
-- Configuring done
-- Generating done
-- Build files have been written to: /home/edsiper/coding/fluent-bit/build

Now you are ready to start the compilation process through the simple make command:

$ make
Scanning dependencies of target msgpack
[  2%] Building C object lib/msgpack-1.1.0/CMakeFiles/msgpack.dir/src/unpack.c.o
[  4%] Building C object lib/msgpack-1.1.0/CMakeFiles/msgpack.dir/src/objectc.c.o
[  7%] Building C object lib/msgpack-1.1.0/CMakeFiles/msgpack.dir/src/version.c.o
...
[ 19%] Building C object lib/monkey/mk_core/CMakeFiles/mk_core.dir/mk_file.c.o
[ 21%] Building C object lib/monkey/mk_core/CMakeFiles/mk_core.dir/mk_rconf.c.o
[ 23%] Building C object lib/monkey/mk_core/CMakeFiles/mk_core.dir/mk_string.c.o
...
Scanning dependencies of target fluent-bit-static
[ 66%] Building C object src/CMakeFiles/fluent-bit-static.dir/flb_pack.c.o
[ 69%] Building C object src/CMakeFiles/fluent-bit-static.dir/flb_input.c.o
[ 71%] Building C object src/CMakeFiles/fluent-bit-static.dir/flb_output.c.o
...
Linking C executable ../bin/fluent-bit
[100%] Built target fluent-bit-bin

to continue installing the binary on the system just do:

$ make install

it's likely you may need root privileges so you can try to prefixing the command with sudo.

Build Options

Fluent Bit provides certain options to CMake that can be enabled or disabled when configuring, please refer to the following tables under the General Options, Development Options, Input Plugins and _Output Plugins sections.

General Options

Development Options

Input Plugins

The input plugins provides certain features to gather information from a specific source type which can be a network interface, some built-in metric or through a specific input device, the following input plugins are available:

Filter Plugins

The filter plugins allows to modify, enrich or drop records. The following table describes the filters available on this version:

Output Plugins

The output plugins gives the capacity to flush the information to some external interface, service or terminal, the following table describes the output plugins available as of this version:

Linux Packages

GPG key updates

From the 1.9.0 and 1.8.15 releases please note that the GPG key has been updated at https://packages.fluentbit.io/fluentbit.key so ensure this new one is added.

The GPG Key fingerprint of the new key is:

C3C0 A285 34B9 293E AF51  FABD 9F9D DC08 3888 C1CD
Fluentbit releases (Releases signing key) <releases@fluentbit.io>

The previous key is still available at https://packages.fluentbit.io/fluentbit-legacy.key and may be required to install previous versions.

The GPG Key fingerprint of the old key is:

F209 D876 2A60 CD49 E680 633B 4FF8 368B 6EA0 722A

Refer to the supported platform documentation to see which platforms are supported in each release.## Migration to Fluent BitFrom version 1.9, td-agent-bit is a deprecated package and will be removed in the future.The correct package name to use now is fluent-bit.Both are currently provided to allow migration.

Administration

Configuring Fluent Bit

Local Testing

Data Pipeline

Inputs

Parsers

Filters

Outputs

Stream Processing

Kubernetes

Kubernetes Production Grade Log Processor

Fluent Bit is a lightweight and extensible Log Processor that comes with full support for Kubernetes:

Process Kubernetes containers logs from the file system or Systemd/Journald.
Enrich logs with Kubernetes Metadata.
Centralize your logs in third party storage services like Elasticsearch, InfluxDB, HTTP, etc.

Concepts

Before getting started it is important to understand how Fluent Bit will be deployed. Kubernetes manages a cluster of nodes, so our log agent tool will need to run on every node to collect logs from every POD, hence Fluent Bit is deployed as a DaemonSet (a POD that runs on every node of the cluster).

When Fluent Bit runs, it will read, parse and filter the logs of every POD and will enrich each entry with the following information (metadata):

Pod Name
Pod ID
Container Name
Container ID
Labels
Annotations

To obtain this information, a built-in filter plugin called kubernetes talks to the Kubernetes API Server to retrieve relevant information such as the pod_id, labels and annotations, other fields such as pod_name, container_id and container_name are retrieved locally from the log file names. All of this is handled automatically, no intervention is required from a configuration aspect.

Our Kubernetes Filter plugin is fully inspired by the Fluentd Kubernetes Metadata Filter written by Jimmi Dyson.

Installation

Fluent Bit must be deployed as a DaemonSet, so on that way it will be available on every node of your Kubernetes cluster. To get started run the following commands to create the namespace, service account and role setup:

For Kubernetes v1.21 and below

$ kubectl create namespace logging
$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-service-account.yaml
$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role.yaml
$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role-binding.yaml

For Kubernetes v1.22

$ kubectl create namespace logging
$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-service-account.yaml
$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role-1.22.yaml
$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-role-binding-1.22.yaml

The next step is to create a ConfigMap that will be used by our Fluent Bit DaemonSet:

$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-configmap.yaml

The default configmap assumes that dockershim is utilized for the cluster. If a CRI runtime, such as containerd or CRI-O, is being utilized, the CRI parser should be utilized. More specifically, change the Parser described in input-kubernetes.conf from docker to cri.

Note for OpenShift

If you are using Red Hat OpenShift you will also need to run the following

$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/fluent-bit-openshift-security-context-constraints.yaml

Note for Kubernetes < v1.16

For Kubernetes versions older than v1.16, the DaemonSet resource is not available on apps/v1 , the resource is available on apiVersion: extensions/v1beta1 . Our current Daemonset Yaml files uses the new apiVersion.

If you are using and older Kubernetes version, manually grab a copy of your Daemonset Yaml file and replace the value of apiVersion from:

apiVersion: apps/v1

apiVersion: extensions/v1beta1

You can read more about this deprecation on Kubernetes v1.14 Changelog here:

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.14.md#deprecations

Fluent Bit to Elasticsearch

Fluent Bit DaemonSet ready to be used with Elasticsearch on a normal Kubernetes Cluster:

$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-ds.yaml

Fluent Bit to Elasticsearch on Minikube

If you are using Minikube for testing purposes, use the following alternative DaemonSet manifest:

$ kubectl create -f https://raw.githubusercontent.com/fluent/fluent-bit-kubernetes-logging/master/output/elasticsearch/fluent-bit-ds-minikube.yaml

Installing with Helm Chart

Helm is a package manager for Kubernetes and allows you to quickly deploy application packages into your running cluster. Fluent Bit is distributed via a helm chart found in the Fluent Helm Charts repo: https://github.com/fluent/helm-charts.

To add the Fluent Helm Charts repo use the following command

helm repo add fluent https://fluent.github.io/helm-charts

To validate that the repo was added you can run helm search repo fluent to ensure the charts were added. The default chart can then be installed by running the following

helm install fluent-bit fluent/fluent-bit

Default Values

The default chart values include configuration to read container logs, with Docker parsing, systemd logs apply Kubernetes metadata enrichment and finally output to an Elasticsearch cluster. You can modify the values file included https://github.com/fluent/helm-charts/blob/master/charts/fluent-bit/values.yaml to specify additional outputs, health checks, monitoring endpoints, or other configuration options.

Details

The default configuration of Fluent Bit makes sure of the following:

Consume all containers logs from the running Node.
The Tail input plugin will not append more than 5MB into the engine until they are flushed to the Elasticsearch backend. This limit aims to provide a workaround for backpressure scenarios.
The Kubernetes filter will enrich the logs with Kubernetes metadata, specifically labels and annotations. The filter only goes to the API Server when it cannot find the cached info, otherwise it uses the cache.
The default backend in the configuration is Elasticsearch set by the Elasticsearch Output Plugin. It uses the Logstash format to ingest the logs. If you need a different Index and Type, please refer to the plugin option and do your own adjustments.
There is an option called Retry_Limit set to False, that means if Fluent Bit cannot flush the records to Elasticsearch it will re-try indefinitely until it succeed.

Container Runtime Interface (CRI) parser

Fluent Bit by default assumes that logs are formatted by the Docker interface standard. However, when using CRI you can run into issues with malformed JSON if you do not modify the parser used. Fluent Bit includes a CRI log parser that can be used instead. An example of the parser is seen below:

# CRI Parser
[PARSER]
    # http://rubular.com/r/tjUt3Awgg4
    Name cri
    Format regex
    Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z

To use this parser change the Input section for your configuration from docker to cri

[INPUT]
    Name tail
    Path /var/log/containers/*.log
    Parser cri
    Tag kube.*
    Mem_Buf_Limit 5MB
    Skip_Long_Lines On

Windows Deployment

Since v1.5.0, Fluent Bit supports deployment to Windows pods.

Log files overview

When deploying Fluent Bit to Kubernetes, there are three log files that you need to pay attention to.

C:\k\kubelet.err.log

This is the error log file from kubelet daemon running on host.
You will need to retain this file for future troubleshooting (to debug deployment failures etc.)

C:\var\log\containers\<pod>_<namespace>_<container>-<docker>.log

This is the main log file you need to watch. Configure Fluent Bit to follow this file.
It is actually a symlink to the Docker log file in C:\ProgramData\, with some additional metadata on its file name.

C:\ProgramData\Docker\containers\<docker>\<docker>.log

This is the log file produced by Docker.
Normally you don't directly read from this file, but you need to make sure that this file is visible from Fluent Bit.

Typically, your deployment yaml contains the following volume configuration.

spec:
  containers:
  - name: fluent-bit
    image: my-repo/fluent-bit:1.8.4
    volumeMounts:
    - mountPath: C:\k
      name: k
    - mountPath: C:\var\log
      name: varlog
    - mountPath: C:\ProgramData
      name: progdata
  volumes:
  - name: k
    hostPath:
      path: C:\k
  - name: varlog
    hostPath:
      path: C:\var\log
  - name: progdata
    hostPath:
      path: C:\ProgramData

Configure Fluent Bit

Assuming the basic volume configuration described above, you can apply the following config to start logging. You can visualize this configuration here

fluent-bit.conf: |
    [SERVICE]
      Parsers_File      C:\\fluent-bit\\parsers.conf

    [INPUT]
      Name              tail
      Tag               kube.*
      Path              C:\\var\\log\\containers\\*.log
      Parser            docker
      DB                C:\\fluent-bit\\tail_docker.db
      Mem_Buf_Limit     7MB
      Refresh_Interval  10

    [INPUT]
      Name              tail
      Tag               kubelet.err
      Path              C:\\k\\kubelet.err.log
      DB                C:\\fluent-bit\\tail_kubelet.db

    [FILTER]
      Name              kubernetes
      Match             kube.*
      Kube_URL          https://kubernetes.default.svc.cluster.local:443

    [OUTPUT]
      Name  stdout
      Match *

parsers.conf: |
    [PARSER]
        Name         docker
        Format       json
        Time_Key     time
        Time_Format  %Y-%m-%dT%H:%M:%S.%L
        Time_Keep    On

Mitigate unstable network on Windows pods

Windows pods often lack working DNS immediately after boot (#78479). To mitigate this issue, filter_kubernetes provides a built-in mechanism to wait until the network starts up:

DNS_Retries - Retries N times until the network start working (6)
DNS_Wait_Time - Lookup interval between network status checks (30)

By default, Fluent Bit waits for 3 minutes (30 seconds x 6 times). If it's not enough for you, tweak the configuration as follows.

[filter]
    Name kubernetes
    ...
    DNS_Retries 10
    DNS_Wait_Time 30

Windows

Fluent Bit is distributed as td-agent-bit package for Windows. Fluent Bit has two flavours of Windows installers: a ZIP archive (for quick testing) and an EXE installer (for system installation).

Configuration

Currently the default configuration is intended for Linux only so will not function on Windows. Make sure to provide a valid Windows configuration with the installation, a sample one is shown below:

[SERVICE]
    # Flush
    # =====
    # set an interval of seconds before to flush records to a destination
    flush        5

    # Daemon
    # ======
    # instruct Fluent Bit to run in foreground or background mode.
    daemon       Off

    # Log_Level
    # =========
    # Set the verbosity level of the service, values can be:
    #
    # - error
    # - warning
    # - info
    # - debug
    # - trace
    #
    # by default 'info' is set, that means it includes 'error' and 'warning'.
    log_level    info

    # Parsers File
    # ============
    # specify an optional 'Parsers' configuration file
    parsers_file parsers.conf

    # Plugins File
    # ============
    # specify an optional 'Plugins' configuration file to load external plugins.
    plugins_file plugins.conf

    # HTTP Server
    # ===========
    # Enable/Disable the built-in HTTP Server for metrics
    http_server  Off
    http_listen  0.0.0.0
    http_port    2020

    # Storage
    # =======
    # Fluent Bit can use memory and filesystem buffering based mechanisms
    #
    # - https://docs.fluentbit.io/manual/administration/buffering-and-storage
    #
    # storage metrics
    # ---------------
    # publish storage pipeline metrics in '/api/v1/storage'. The metrics are
    # exported only if the 'http_server' option is enabled.
    #
    storage.metrics on

[INPUT]
    Name         winlog
    Channels     Setup,Windows PowerShell
    Interval_Sec 1

[OUTPUT]
    name  stdout
    match *

Installation Packages

The latest stable version is 1.8.15, each version is available on the Github release as well as at https://fluentbit.io/releases/<Major Version>/Major>fluent-bit-<Full Version>-win[32|64].exe:

Legacy td-agent-bit packages are also available, just substitute fluent-bit with td-agent-bit in the URLs above.

To check the integrity, use Get-FileHash cmdlet on PowerShell.

PS> Get-FileHash fluent-bit-1.8.15-win32.exe

Installing from ZIP archive

Download a ZIP archive from above. There are installers for 32-bit and 64-bit environments, so choose one suitable for your environment.

Then you need to expand the ZIP archive. You can do this by clicking "Extract All" on Explorer, or if you're using PowerShell, you can use Expand-Archive cmdlet.

PS> Expand-Archive td-agent-bit-1.8.12-win64.zip

The ZIP package contains the following set of files.

td-agent-bit
├── bin
│   ├── fluent-bit.dll
│   └── fluent-bit.exe
├── conf
│   ├── fluent-bit.conf
│   ├── parsers.conf
│   └── plugins.conf
└── include
    │   ├── flb_api.h
    │   ├── ...
    │   └── flb_worker.h
    └── fluent-bit.h

Now, launch cmd.exe or PowerShell on your machine, and execute fluent-bit.exe as follows.

PS> .\bin\fluent-bit.exe -i dummy -o stdout

If you see the following output, it's working fine!

PS> .\bin\fluent-bit.exe  -i dummy -o stdout
Fluent Bit v1.8.x
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2019/06/28 10:13:04] [ info] [storage] initializing...
[2019/06/28 10:13:04] [ info] [storage] in-memory
[2019/06/28 10:13:04] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/06/28 10:13:04] [ info] [engine] started (pid=10324)
[2019/06/28 10:13:04] [ info] [sp] stream processor started
[0] dummy.0: [1561684385.443823800, {"message"=>"dummy"}]
[1] dummy.0: [1561684386.428399000, {"message"=>"dummy"}]
[2] dummy.0: [1561684387.443641900, {"message"=>"dummy"}]
[3] dummy.0: [1561684388.441405800, {"message"=>"dummy"}]

To halt the process, press CTRL-C in the terminal.

Installing from EXE installer

Download an EXE installer from the download page. It has both 32-bit and 64-bit builds. Choose one which is suitable for you.

Then, double-click the EXE installer you've downloaded. Installation wizard will automatically start.

Click Next and proceed. By default, Fluent Bit is installed into C:\Program Files\td-agent-bit\, so you should be able to launch fluent-bit as follow after installation.

PS> C:\Program Files\td-agent-bit\bin\fluent-bit.exe -i dummy -o stdout

Installer options

The Windows installer is built by [CPack using NSIS(https://cmake.org/cmake/help/latest/cpack_gen/nsis.html) and so supports the default options that all NSIS installers do for silent installation and the directory to install to.

To silently install to C:\fluent-bit directory here is an example:

PS> <installer exe> /S /D=C:\fluent-bit

The uninstaller automatically provided also supports a silent un-install using the same /S flag. This may be useful for provisioning with automation like Ansible, Puppet, etc.

Windows Service Support

Windows services are equivalent to "daemons" in UNIX (i.e. long-running background processes). Since v1.5.0, Fluent Bit has the native support for Windows Service.

Suppose you have the following installation layout:

C:\fluent-bit\
├── conf
│   ├── fluent-bit.conf
│   └── parsers.conf
└── bin
    ├── fluent-bit.dll
    └── fluent-bit.exe

To register Fluent Bit as a Windows service, you need to execute the following command on Command Prompt. Please be careful that a single space is required after binpath=.

% sc.exe create fluent-bit binpath= "\fluent-bit\bin\fluent-bit.exe -c \fluent-bit\conf\fluent-bit.conf"

Now Fluent Bit can be started and managed as a normal Windows service.

% sc.exe start fluent-bit
% sc.exe query fluent-bit
SERVICE_NAME: fluent-bit
    TYPE               : 10  WIN32_OWN_PROCESS
    STATE              : 4 Running
    ...

To halt the Fluent Bit service, just execute the "stop" command.

% sc.exe stop fluent-bit

To start Fluent Bit automatically on boot, execute the following:

% sc.exe config fluent-bit start= auto

[FAQ] Fluent Bit fails to start up when installed under `C:\Program Files`

Quotations are required if file paths contain spaces. Here is an example:

% sc.exe create fluent-bit binpath= "\"C:\Program Files\fluent-bit\bin\fluent-bit.exe\" -c \"C:\Program Files\fluent-bit\conf\fluent-bit.conf\""

[FAQ] How can I manage Fluent Bit service via PowerShell?

Instead of sc.exe, PowerShell can be used to manage Windows services.

Create a Fluent Bit service:

PS> New-Service fluent-bit -BinaryPathName "C:\fluent-bit\bin\fluent-bit.exe -c C:\fluent-bit\conf\fluent-bit.conf" -StartupType Automatic

Start the service:

PS> Start-Service fluent-bit

Query the service status:

PS> get-Service fluent-bit | format-list
Name                : fluent-bit
DisplayName         : fluent-bit
Status              : Running
DependentServices   : {}
ServicesDependedOn  : {}
CanPauseAndContinue : False
CanShutdown         : False
CanStop             : True
ServiceType         : Win32OwnProcess

Stop the service:

PS> Stop-Service fluent-bit

Remove the service (requires PowerShell 6.0 or later)

PS> Remove-Service fluent-bit

Compile from Source

If you need to create a custom executable, you can use the following procedure to compile Fluent Bit by yourself.

Preparation

First, you need Microsoft Visual C++ to compile Fluent Bit. You can install the minimum toolkit by the following command:

PS> wget -o vs.exe https://aka.ms/vs/16/release/vs_buildtools.exe
PS> start vs.exe

When asked which packages to install, choose "C++ Build Tools" (make sure that "C++ CMake tools for Windows" is selected too) and wait until the process finishes.

Also you need to install flex and bison. One way to install them on Windows is to use winflexbison.

PS> wget -o winflexbison.zip https://github.com/lexxmark/winflexbison/releases/download/v2.5.22/win_flex_bison-2.5.22.zip
PS> Expand-Archive winflexbison.zip -Destination C:\WinFlexBison
PS> cp -Path C:\WinFlexBison\win_bison.exe C:\WinFlexBison\bison.exe
PS> cp -Path C:\WinFlexBison\win_flex.exe C:\WinFlexBison\flex.exe

Add the path C:\WinFlexBison to your systems environment variable "Path". Here's how to do that.

Also you need to install git to pull the source code from the repository.

PS> wget -o git.exe https://github.com/git-for-windows/git/releases/download/v2.28.0.windows.1/Git-2.28.0-64-bit.exe
PS> start git.exe

Compilation

Open the start menu on Windows and type "Developer Command Prompt".

Clone the source code of Fluent Bit.

% git clone https://github.com/fluent/fluent-bit
% cd fluent-bit/build

Compile the source code.

% cmake .. -G "NMake Makefiles"
% cmake --build .

Now you should be able to run Fluent Bit:

% .\bin\debug\fluent-bit.exe -i dummy -o stdout

Packaging

To create a ZIP package, call cpack as follows:

% cpack -G ZIP

Tail

The tail input plugin allows to monitor one or several text files. It has a similar behavior like tail -f shell command.

The plugin reads every matched file in the Path pattern and for every new line found (separated by a ), it generates a new record. Optionally a database file can be used so the plugin can have a history of tracked files and a state of offsets, this is very useful to resume a state if the service is restarted.

Configuration Parameters

The plugin supports the following configuration parameters:

Note that if the database parameter DB is not specified, by default the plugin will start reading each target file from the beginning. This also might cause some unwanted behavior, for example when a line is bigger that Buffer_Chunk_Size and Skip_Long_Lines is not turned on, the file will be read from the beginning of each Refresh_Interval until the file is rotated.

Multiline Support

Starting from Fluent Bit v1.8 we have introduced a new Multiline core functionality. For Tail input plugin, it means that now it supports the old configuration mechanism but also the new one. In order to avoid breaking changes, we will keep both but encourage our users to use the latest one. We will call the two mechanisms as:

Multiline Core
Old Multiline

Multiline Core (v1.8)

The new multiline core is exposed by the following configuration:

As stated in the Multiline Parser documentation, now we provide built-in configuration modes. Note that when using a new multiline.parser definition, you must disable the old configuration from your tail section like:

parser
parser_firstline
parser_N
multiline
multiline_flush
docker_mode

Multiline and Containers (v1.8)

If you are running Fluent Bit to process logs coming from containers like Docker or CRI, you can use the new built-in modes for such purposes. This will help to reassembly multiline messages originally split by Docker or CRI:

[INPUT]
    name              tail
    path              /var/log/containers/*.log
    multiline.parser  docker, cri

The two options separated by a comma means multi-format: try docker and cri multiline formats.

We are still working on extending support to do multiline for nested stack traces and such. Over the Fluent Bit v1.8.x release cycle we will be updating the documentation.

Old Multiline Configuration Parameters

For the old multiline configuration, the following options exist to configure the handling of multilines logs:

Old Docker Mode Configuration Parameters

Docker mode exists to recombine JSON log lines split by the Docker daemon due to its line length limit. To use this feature, configure the tail plugin with the corresponding parser and then enable Docker mode:

Getting Started

In order to tail text or log files, you can run the plugin from the command line or through the configuration file:

Command Line

From the command line you can let Fluent Bit parse text files with the following options:

$ fluent-bit -i tail -p path=/var/log/syslog -o stdout

Configuration File

In your main configuration file append the following Input & Output sections. An example visualization can be found here

[INPUT]
    Name        tail
    Path        /var/log/syslog

[OUTPUT]
    Name   stdout
    Match  *

Old Multi-line example

When using multi-line configuration you need to first specify Multiline On in the configuration and use the Parser_Firstline and additional parser parameters Parser_N if needed. If we are trying to read the following Java Stacktrace as a single event

Dec 14 06:41:08 Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
    at com.myproject.module.MyProject.badMethod(MyProject.java:22)
    at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
    at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
    at com.myproject.module.MyProject.someMethod(MyProject.java:10)
    at com.myproject.module.MyProject.main(MyProject.java:6)

We need to specify a Parser_Firstline parameter that matches the first line of a multi-line event. Once a match is made Fluent Bit will read all future lines until another match with Parser_Firstline is made .

In the case above we can use the following parser, that extracts the Time as time and the remaining portion of the multiline as log

[PARSER]
    Name multiline
    Format regex
    Regex /(?<time>Dec \d+ \d+\:\d+\:\d+)(?<message>.*)/
    Time_Key  time
    Time_Format %b %d %H:%M:%S

If we want to further parse the entire event we can add additional parsers with Parser_N where N is an integer. The final Fluent Bit configuration looks like the following:

# Note this is generally added to parsers.conf and referenced in [SERVICE]
[PARSER]
    Name multiline
    Format regex
    Regex /(?<time>Dec \d+ \d+\:\d+\:\d+)(?<message>.*)/
    Time_Key  time
    Time_Format %b %d %H:%M:%S

[INPUT]
    Name             tail
    Multiline        On
    Parser_Firstline multiline
    Path             /var/log/java.log

[OUTPUT]
    Name             stdout
    Match            *

Our output will be as follows.

[0] tail.0: [1607928428.466041977, {"message"=>"Exception in thread "main" java.lang.RuntimeException: Something has gone wrong, aborting!
    at com.myproject.module.MyProject.badMethod(MyProject.java:22)
    at com.myproject.module.MyProject.oneMoreMethod(MyProject.java:18)
    at com.myproject.module.MyProject.anotherMethod(MyProject.java:14)
    at com.myproject.module.MyProject.someMethod(MyProject.java:10)", "message"=>"at com.myproject.module.MyProject.main(MyProject.java:6)"}]

Tailing files keeping state

The tail input plugin a feature to save the state of the tracked files, is strongly suggested you enabled this. For this purpose the db property is available, e.g:

$ fluent-bit -i tail -p path=/var/log/syslog -p db=/path/to/logs.db -o stdout

When running, the database file /path/to/logs.db will be created, this database is backed by SQLite3 so if you are interested into explore the content, you can open it with the SQLite client tool, e.g:

$ sqlite3 tail.db
-- Loading resources from /home/edsiper/.sqliterc

SQLite version 3.14.1 2016-08-11 18:53:32
Enter ".help" for usage hints.
sqlite> SELECT * FROM in_tail_files;
id     name                              offset        inode         created
-----  --------------------------------  ------------  ------------  ----------
1      /var/log/syslog                   73453145      23462108      1480371857
sqlite>

Make sure to explore when Fluent Bit is not hard working on the database file, otherwise you will see some Error: database is locked messages.

Formatting SQLite

By default SQLite client tool do not format the columns in a human read-way, so to explore in_tail_files table you can create a config file in ~/.sqliterc with the following content:

.headers on
.mode column
.width 5 32 12 12 10

SQLite and Write Ahead Logging

Fluent Bit keep the state or checkpoint of each file through using a SQLite database file, so if the service is restarted, it can continue consuming files from it last checkpoint position (offset). The default options set are enabled for high performance and corruption-safe.

The SQLite journaling mode enabled is Write Ahead Log or WAL. This allows to improve performance of read and write operations to disk. When enabled, you will see in your file system additional files being created, consider the following configuration statement:

[INPUT]
    name    tail
    path    /var/log/containers/*.log
    db      test.db

The above configuration enables a database file called test.db and in the same path for that file SQLite will create two additional files:

test.db-shm
test.db-wal

Those two files aims to support the WAL mechanism that helps to improve performance and reduce the number system calls required. The -wal file refers to the file that stores the new changes to be committed, at some point the WAL file transactions are moved back to the real database file. The -shm file is a shared-memory type to allow concurrent-users to the WAL file.

WAL and Memory Usage

The WAL mechanism give us higher performance but also might increase the memory usage by Fluent Bit. Most of this usage comes from the memory mapped and cached pages. In some cases you might see that memory usage keeps a bit high giving the impression of a memory leak, but actually is not relevant unless you want your memory metrics back to normal. Starting from Fluent Bit v1.7.3 we introduced the new option db.journal_mode mode that sets the journal mode for databases, by default it will be WAL (Write-Ahead Logging), currently allowed configurations for db.journal_mode are DELETE | TRUNCATE | PERSIST | MEMORY | WAL | OFF .

File Rotation

File rotation is properly handled, including logrotate's copytruncate mode.

Note that the Path patterns cannot match the rotated files. Otherwise, the rotated file would be read again and lead to duplicate records.

Kubernetes

Fluent Bit Kubernetes Filter allows to enrich your log files with Kubernetes metadata.

When Fluent Bit is deployed in Kubernetes as a DaemonSet and configured to read the log files from the containers (using tail or systemd input plugins), this filter aims to perform the following operations:

Analyze the Tag and extract the following metadata:
- Pod Name
- Namespace
- Container Name
- Container ID
Query Kubernetes API Server to obtain extra metadata for the POD in question:
- Pod ID
- Labels
- Annotations

The data is cached locally in memory and appended to each record.

Configuration Parameters

The plugin supports the following configuration parameters:

Processing the 'log' value

Kubernetes Filter aims to provide several ways to process the data contained in the log key. The following explanation of the workflow assumes that your original Docker parser defined in parsers.conf is as follows:

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On

Since Fluent Bit v1.2 we are not suggesting the use of decoders (Decode_Field_As) if you are using Elasticsearch database in the output to avoid data type conflicts.

To perform processing of the log key, it's mandatory to enable the Merge_Log configuration property in this filter, then the following processing order will be done:

If a Pod suggest a parser, the filter will use that parser to process the content of log.
If the option Merge_Parser was set and the Pod did not suggest a parser, process the log content using the suggested parser in the configuration.
If no Pod was suggested and no Merge_Parser is set, try to handle the content as JSON.

If log value processing fails, the value is untouched. The order above is not chained, meaning it's exclusive and the filter will try only one of the options above, not all of them.

Kubernetes Annotations

A flexible feature of Fluent Bit Kubernetes filter is that allow Kubernetes Pods to suggest certain behaviors for the log processor pipeline when processing the records. At the moment it support:

Suggest a pre-defined parser
Request to exclude logs

The following annotations are available:

Annotation Examples in Pod definition

Suggest a parser

The following Pod definition runs a Pod that emits Apache logs to the standard output, in the Annotations it suggest that the data should be processed using the pre-defined parser called apache:

apiVersion: v1
kind: Pod
metadata:
  name: apache-logs
  labels:
    app: apache-logs
  annotations:
    fluentbit.io/parser: apache
spec:
  containers:
  - name: apache
    image: edsiper/apache_logs

Request to exclude logs

There are certain situations where the user would like to request that the log processor simply skip the logs from the Pod in question:

apiVersion: v1
kind: Pod
metadata:
  name: apache-logs
  labels:
    app: apache-logs
  annotations:
    fluentbit.io/exclude: "true"
spec:
  containers:
  - name: apache
    image: edsiper/apache_logs

Note that the annotation value is boolean which can take a true or false and must be quoted.

Workflow of Tail + Kubernetes Filter

Kubernetes Filter depends on either Tail or Systemd input plugins to process and enrich records with Kubernetes metadata. Here we will explain the workflow of Tail and how it configuration is correlated with Kubernetes filter. Consider the following configuration example (just for demo purposes, not production):

[INPUT]
    Name    tail
    Tag     kube.*
    Path    /var/log/containers/*.log
    Parser  docker

[FILTER]
    Name             kubernetes
    Match            kube.*
    Kube_URL         https://kubernetes.default.svc:443
    Kube_CA_File     /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File  /var/run/secrets/kubernetes.io/serviceaccount/token
    Kube_Tag_Prefix  kube.var.log.containers.
    Merge_Log        On
    Merge_Log_Key    log_processed

In the input section, the Tail plugin will monitor all files ending in .log in path /var/log/containers/. For every file it will read every line and apply the docker parser. Then the records are emitted to the next step with an expanded tag.

Tail support Tags expansion, which means that if a tag have a star character (*), it will replace the value with the absolute path of the monitored file, so if you file name and path is:

/var/log/container/apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

then the Tag for every record of that file becomes:

kube.var.log.containers.apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

note that slashes are replaced with dots.

When Kubernetes Filter runs, it will try to match all records that starts with kube. (note the ending dot), so records from the file mentioned above will hit the matching rule and the filter will try to enrich the records

Kubernetes Filter do not care from where the logs comes from, but it cares about the absolute name of the monitored file, because that information contains the pod name and namespace name that are used to retrieve associated metadata to the running Pod from the Kubernetes Master/API Server.

If you have large pod specifications (can be caused by large numbers of environment variables, etc.), be sure to increase the Buffer_Size parameter of the kubernetes filter. If object sizes exceed this buffer, some metadata will fail to be injected to the logs.

If the configuration property Kube_Tag_Prefix was configured (available on Fluent Bit >= 1.1.x), it will use that value to remove the prefix that was appended to the Tag in the previous Input section. Note that the configuration property defaults to kube.var.logs.containers. , so the previous Tag content will be transformed from:

kube.var.log.containers.apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

to:

apache-logs-annotated_default_apache-aeeccc7a9f00f6e4e066aeff0434cf80621215071f1b20a51e8340aa7c35eac6.log

the transformation above do not modify the original Tag, just creates a new representation for the filter to perform metadata lookup.

that new value is used by the filter to lookup the pod name and namespace, for that purpose it uses an internal Regular expression:

(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$

If you want to know more details, check the source code of that definition here.

You can see on Rublar.com web site how this operation is performed, check the following demo link:

https://rubular.com/r/HZz3tYAahj6JCd

Custom Regex

Under certain and not common conditions, a user would want to alter that hard-coded regular expression, for that purpose the option Regex_Parser can be used (documented on top).

Final Comments

So at this point the filter is able to gather the values of pod_name and namespace, with that information it will check in the local cache (internal hash table) if some metadata for that key pair exists, if so, it will enrich the record with the metadata value, otherwise it will connect to the Kubernetes Master/API Server and retrieve that information.

Optional Feature: Using Kubelet to Get Metadata

There is an issue reported about kube-apiserver fall over and become unresponsive when cluster is too large and too many requests are sent to it. For this feature, fluent bit Kubernetes filter will send the request to kubelet /pods endpoint instead of kube-apiserver to retrieve the pods information and use it to enrich the log. Since Kubelet is running locally in nodes, the request would be responded faster and each node would only get one request one time. This could save kube-apiserver power to handle other requests. When this feature is enabled, you should see no difference in the kubernetes metadata added to logs, but the Kube-apiserver bottleneck should be avoided when cluster is large.

Configuration Setup

There are some configuration setup needed for this feature.

Role Configuration for Fluent Bit DaemonSet Example:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentbitds
  namespace: fluentbit-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: fluentbit
rules:
  - apiGroups: [""]
    resources:
      - namespaces
      - pods
      - nodes
      - nodes/proxy
    verbs: 
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: fluentbit
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluentbit
subjects:
  - kind: ServiceAccount
    name: fluentbitds
    namespace: fluentbit-system

The difference is that kubelet need a special permission for resource nodes/proxy to get HTTP request in. When creating the role or clusterRole, you need to add nodes/proxy into the rule for resource.

Fluent Bit Configuration Example:

    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        DB                /var/log/flb_kube.db
        Parser            docker
        Docker_Mode       On
        Mem_Buf_Limit     50MB
        Skip_Long_Lines   On
        Refresh_Interval  10

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        Buffer_Size         0
        Use_Kubelet         true
        Kubelet_Port        10250

So for fluent bit configuration, you need to set the Use_Kubelet to true to enable this feature.

DaemonSet config Example:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentbit
  namespace: fluentbit-system
  labels:
    app.kubernetes.io/name: fluentbit
spec:
  selector:
    matchLabels:
      name: fluentbit
  template:
    metadata:
      labels:
        name: fluentbit
    spec:
      serviceAccountName: fluentbitds
      containers:
        - name: fluent-bit
          imagePullPolicy: Always
          image: fluent/fluent-bit:latest
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluentbit-config
              mountPath: /fluent-bit/etc/
          resources:
            limits:
              memory: 1500Mi
            requests:
              cpu: 500m
              memory: 500Mi
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: fluentbit-config
          configMap:
            name: fluentbit-config

The key point is to set hostNetwork to true and dnsPolicy to ClusterFirstWithHostNet that fluent bit DaemonSet could call Kubelet locally. Otherwise it could not resolve the dns for kubelet.

Now you are good to use this new feature!

Verify that the Use_Kubelet option is working

Basically you should see no difference about your experience for enriching your log files with Kubernetes metadata.

To check if Fluent Bit is using the kubelet, you can check fluent bit logs and there should be a log like this:

[ info] [filter:kubernetes:kubernetes.0] testing connectivity with Kubelet...

And if you are in debug mode, you could see more:

[debug] [filter:kubernetes:kubernetes.0] Send out request to Kubelet for pods information.
[debug] [filter:kubernetes:kubernetes.0] Request (ns=<namespace>, pod=node name) http_do=0, HTTP Status: 200
[ info] [filter:kubernetes:kubernetes.0] connectivity OK
[2021/02/05 10:33:35] [debug] [filter:kubernetes:kubernetes.0] Request (ns=<Namespace>, pod=<podName>) http_do=0, HTTP Status: 200
[2021/02/05 10:33:35] [debug] [filter:kubernetes:kubernetes.0] kubelet find pod: <podName> and ns: <Namespace> match

Troubleshooting

The following section goes over specific log messages you may run into and how to solve them to ensure that Fluent Bit's Kubernetes filter is operating properly

I can't see metadata appended to my pod or other Kubernetes objects

If you are not seeing metadata added to your kubernetes logs and see the following in your log message, then you may be facing connectivity issues with the Kubernetes API server.

[2020/10/15 03:48:57] [ info] [filter_kube] testing connectivity with API server...
[2020/10/15 03:48:57] [error] [filter_kube] upstream connection error
[2020/10/15 03:48:57] [ warn] [filter_kube] could not get meta for POD

Potential fix #1: Check Kubernetes roles

When Fluent Bit is deployed as a DaemonSet it generally runs with specific roles that allow the application to talk to the Kubernetes API server. If you are deployed in a more restricted environment check that all the Kubernetes roles are set correctly.

You can test this by running the following command (replace fluentbit-system with the namespace where your fluentbit is installed)

kubectl auth can-i list pods --as=system:serviceaccount:fluentbit-system:fluentbit

If set roles are configured correctly, it should simply respond with yes.

For instance, using Azure AKS, running the above command may respond with:

no - Azure does not have opinion for this user.

If you have connectivity to the API server, but still "could not get meta for POD" - debug logging might give you a message with Azure does not have opinion for this user. Then the following subject may need to be included in the fluentbit ClusterRoleBinding:

appended to subjects array:

- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts

Potential fix #2: Check Kubernetes IPv6

There may be cases where you have IPv6 on in the environment and you need to enable this within Fluent Bit. Under the service tag please set the following option ipv6 to on .

Potential fix #3: Check connectivity to Kube_URL

By default the Kube_URL is set to https://kubernetes.default.svc:443 . Ensure that you have connectivity to this endpoint from within the cluster and that there are no special permission interfering with the connection.

I can't see new objects getting metadata

In some cases, you may only see some objects being appended with metadata while other objects are not enriched. This can occur at times when local data is cached and does not contain the correct id for the kubernetes object that requires enrichment. For most Kubernetes objects the Kubernetes API server is updated which will then be reflected in Fluent Bit logs, however in some cases for Pod objects this refresh to the Kubernetes API server can be skipped, causing metadata to be skipped.

Rewrite Tag

Powerful and flexible routing

The rewrite_tag filter, allows to re-emit a record under a new Tag. Once a record has been re-emitted, the original record can be preserved or discarded.

How it Works

The way it works is defining rules that matches specific record key content against a regular expression, if a match exists, a new record with the defined Tag will be emitted, entering from the beginning of the pipeline. Multiple rules can be specified and they are processed in order until one of them matches.

The new Tag to define can be composed by:

Alphabet characters & Numbers
Original Tag string or part of it
Regular Expressions groups capture
Any key or sub-key of the processed record
Environment variables

Configuration Parameters

The rewrite_tag filter supports the following configuration parameters:

Rules

A rule aims to define matching criteria and specify how to create a new Tag for a record. You can define one or multiple rules in the same configuration section. The rules have the following format:

Key

The key represents the name of the record key that holds the value that we want to use to match our regular expression. A key name is specified and prefixed with a $. Consider the following structured record (formatted for readability):

If we wanted to match against the value of the key name we must use $name. The key selector is flexible enough to allow to match nested levels of sub-maps from the structure. If we wanted to check the value of the nested key s2 we can do it specifying $ss['s1']['s2'], for short:

$name = "abc-123"
$ss['s1']['s2'] = "flb"

Note that a key must point a value that contains a string, it's not valid for numbers, booleans, maps or arrays.

Regex

Using a simple regular expression we can specify a matching pattern to use against the value of the key specified above, also we can take advantage of group capturing to create custom placeholder values.

If we wanted to match any record that it $name contains a value of the format string-number like the example provided above, we might use:

Note that in our example we are using parentheses, this teams that we are specifying groups of data. If the pattern matches the value a placeholder will be created that can be consumed by the NEW_TAG section.

If $name equals abc-123 , then the following placeholders will be created:

$0 = "abc-123"
$1 = "abc"
$2 = "123"

If the Regular expression do not matches an incoming record, the rule will be skipped and the next rule (if any) will be processed.

New Tag

If a regular expression has matched the value of the defined key in the rule, we are ready to compose a new Tag for that specific record. The tag is a concatenated string that can contain any of the following characters: a-z,A-Z, 0-9 and .-,.

A Tag can take any string value from the matching record, the original tag it self, environment variable or general placeholder.

Consider the following incoming data on the rule:

Tag = aa.bb.cc
Record = {"name": "abc-123", "ss": {"s1": {"s2": "flb"}}}
Environment variable $HOSTNAME = fluent

With such information we could create a very custom Tag for our record like the following:

the expected Tag to generated will be:

We make use of placeholders, record content and environment variables.

Keep

If a rule matches a rule the filter will emit a copy of the record with the new defined Tag. The property keep takes a boolean value to define if the original record with the old Tag must be preserved and continue in the pipeline or just be discarded.

You can use true or false to decide the expected behavior. There is no default value and this is a mandatory field in the rule.

Configuration Example

The following configuration example will emit a dummy (hand-crafted) record, the filter will rewrite the tag, discard the old record and print the new record to the standard output interface:

The original tag test_tag will be rewritten as from.test_tag.new.fluent.bit.out:

Monitoring

Since rewrite_tag emit new records that goes through the beginning of the pipeline, it exposes an additional metric called emit_records that summarize the total number of emitted records.

Understanding the Metrics

Using the configuration provided above, if we query the metrics exposed in the HTTP interface we will see the following:

Command:

Metrics output:

The dummy input generated two records, the filter dropped two from the chunks and emitted two new ones under a different Tag.

The records generated are handled by the internal Emitter, so the new records are summarized in the Emitter metrics, take a look at the entry called emitter_for_rewrite_tag.0.

What is the Emitter ?

The Emitter is an internal Fluent Bit plugin that allows other components of the pipeline to emit custom records. On this case rewrite_tag creates an Emitter instance to use it exclusively to emit records, on that way we can have a granular control of who is emitting what.

The Emitter name in the metrics can be changed setting up the Emitter_Name configuration property described above.

Forward

Forward is the protocol used by Fluentd to route messages between peers. The forward output plugin provides interoperability between Fluent Bit and Fluentd. There are no configuration steps required besides specifying where Fluentd is located, which can be a local or a remote destination.

This plugin offers two different transports and modes:

Forward (TCP): It uses a plain TCP connection.
Secure Forward (TLS): when TLS is enabled, the plugin switch to Secure Forward mode.

Configuration Parameters

The following parameters are mandatory for either Forward for Secure Forward modes:

Secure Forward Mode Configuration Parameters

When using Secure Forward mode, the TLS mode requires to be enabled. The following additional configuration parameters are available:

Forward Setup

Before proceeding, make sure that Fluentd is installed, if it's not the case please refer to the following Fluentd Installation document and go ahead with that.

Once Fluentd is installed, create the following configuration file example that will allow us to stream data into it:

<source>
  type forward
  bind 0.0.0.0
  port 24224
</source>

<match fluent_bit>
  type stdout
</match>

That configuration file specifies that it will listen for TCP connections on the port 24224 through the forward input type. Then for every message with a fluent_bit TAG, will print the message to the standard output.

In one terminal launch Fluentd specifying the new configuration file created:

$ fluentd -c test.conf
2017-03-23 11:50:43 -0600 [info]: reading config file path="test.conf"
2017-03-23 11:50:43 -0600 [info]: starting fluentd-0.12.33
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-mixin-config-placeholders' version '0.3.1'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-docker' version '0.1.0'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-elasticsearch' version '1.4.0'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-flatten-hash' version '0.2.0'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-flowcounter-simple' version '0.0.4'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-influxdb' version '0.2.8'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-json-in-json' version '0.1.4'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-mongo' version '0.7.10'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-out-http' version '0.1.3'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-parser' version '0.6.0'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-record-reformer' version '0.7.0'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.1'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-stdin' version '0.1.1'
2017-03-23 11:50:43 -0600 [info]: gem 'fluent-plugin-td' version '0.10.27'
2017-03-23 11:50:43 -0600 [info]: adding match pattern="fluent_bit" type="stdout"
2017-03-23 11:50:43 -0600 [info]: adding source type="forward"
2017-03-23 11:50:43 -0600 [info]: using configuration file: <ROOT>
  <source>
    type forward
    bind 0.0.0.0
    port 24224
  </source>
  <match fluent_bit>
    type stdout
  </match>
</ROOT>
2017-03-23 11:50:43 -0600 [info]: listening fluent socket on 0.0.0.0:24224

Fluent Bit + Forward Setup

Now that Fluentd is ready to receive messages, we need to specify where the forward output plugin will flush the information using the following format:

bin/fluent-bit -i INPUT -o forward://HOST:PORT

If the TAG parameter is not set, the plugin will retain the tag. Keep in mind that TAG is important for routing rules inside Fluentd.

Using the CPU input plugin as an example we will flush CPU metrics to Fluentd with tag fluent_bit:

$ bin/fluent-bit -i cpu -t fluent_bit -o forward://127.0.0.1:24224

Now on the Fluentd side, you will see the CPU metrics gathered in the last seconds:

2017-03-23 11:53:06 -0600 fluent_bit: {"cpu_p":0.0,"user_p":0.0,"system_p":0.0,"cpu0.p_cpu":0.0,"cpu0.p_user":0.0,"cpu0.p_system":0.0,"cpu1.p_cpu":0.0,"cpu1.p_user":0.0,"cpu1.p_system":0.0,"cpu2.p_cpu":0.0,"cpu2.p_user":0.0,"cpu2.p_system":0.0,"cpu3.p_cpu":1.0,"cpu3.p_user":1.0,"cpu3.p_system":0.0}
2017-03-23 11:53:07 -0600 fluent_bit: {"cpu_p":2.25,"user_p":2.0,"system_p":0.25,"cpu0.p_cpu":3.0,"cpu0.p_user":3.0,"cpu0.p_system":0.0,"cpu1.p_cpu":1.0,"cpu1.p_user":1.0,"cpu1.p_system":0.0,"cpu2.p_cpu":1.0,"cpu2.p_user":1.0,"cpu2.p_system":0.0,"cpu3.p_cpu":3.0,"cpu3.p_user":2.0,"cpu3.p_system":1.0}
2017-03-23 11:53:08 -0600 fluent_bit: {"cpu_p":1.75,"user_p":1.0,"system_p":0.75,"cpu0.p_cpu":2.0,"cpu0.p_user":1.0,"cpu0.p_system":1.0,"cpu1.p_cpu":3.0,"cpu1.p_user":1.0,"cpu1.p_system":2.0,"cpu2.p_cpu":3.0,"cpu2.p_user":2.0,"cpu2.p_system":1.0,"cpu3.p_cpu":2.0,"cpu3.p_user":1.0,"cpu3.p_system":1.0}
2017-03-23 11:53:09 -0600 fluent_bit: {"cpu_p":4.75,"user_p":3.5,"system_p":1.25,"cpu0.p_cpu":4.0,"cpu0.p_user":3.0,"cpu0.p_system":1.0,"cpu1.p_cpu":5.0,"cpu1.p_user":4.0,"cpu1.p_system":1.0,"cpu2.p_cpu":3.0,"cpu2.p_user":2.0,"cpu2.p_system":1.0,"cpu3.p_cpu":5.0,"cpu3.p_user":4.0,"cpu3.p_system":1.0}

So we gathered CPU metrics and flushed them out to Fluentd properly.

Fluent Bit + Secure Forward Setup

DISCLAIMER: the following example does not consider the generation of certificates for best practice on production environments.

Secure Forward aims to provide a secure channel of communication with the remote Fluentd service using TLS.

Fluent Bit

Paste this content in a file called flb.conf:

[SERVICE]
    Flush      5
    Daemon     off
    Log_Level  info

[INPUT]
    Name       cpu
    Tag        cpu_usage

[OUTPUT]
    Name          forward
    Match         *
    Host          127.0.0.1
    Port          24284
    Shared_Key    secret
    Self_Hostname flb.local
    tls           on
    tls.verify    off

Fluentd

Paste this content in a file called fld.conf:

<source>
  @type         secure_forward
  self_hostname myserver.local
  shared_key    secret
  secure no
</source>

<match **>
 @type stdout
</match>

If you're using Fluentd v1, set up it as below:

<source>
  @type forward
  <transport tls>
    cert_path /etc/td-agent/certs/fluentd.crt
    private_key_path /etc/td-agent/certs/fluentd.key
    private_key_passphrase password
  </transport>
  <security>
    self_hostname myserver.local
    shared_key secret
  </security>
</source>

<match **>
 @type stdout
</match>

Test Communication

Start Fluentd:

$ fluentd -c fld.conf

Start Fluent Bit:

$ fluent-bit -c flb.conf

After five seconds, Fluent Bit will write records to Fluentd. In Fluentd output you will see a message like this:

2017-03-23 13:34:40 -0600 [info]: using configuration file: <ROOT>
  <source>
    @type secure_forward
    self_hostname myserver.local
    shared_key xxxxxx
    secure no
  </source>
  <match **>
    @type stdout
  </match>
</ROOT>
2017-03-23 13:34:41 -0600 cpu_usage: {"cpu_p":1.0,"user_p":0.75,"system_p":0.25,"cpu0.p_cpu":1.0,"cpu0.p_user":1.0,"cpu0.p_system":0.0,"cpu1.p_cpu":2.0,"cpu1.p_user":1.0,"cpu1.p_system":1.0,"cpu2.p_cpu":1.0,"cpu2.p_user":1.0,"cpu2.p_system":0.0,"cpu3.p_cpu":2.0,"cpu3.p_user":1.0,"cpu3.p_system":1.0}
2017-03-23 13:34:42 -0600 cpu_usage: {"cpu_p":1.75,"user_p":1.75,"system_p":0.0,"cpu0.p_cpu":3.0,"cpu0.p_user":3.0,"cpu0.p_system":0.0,"cpu1.p_cpu":2.0,"cpu1.p_user":2.0,"cpu1.p_system":0.0,"cpu2.p_cpu":0.0,"cpu2.p_user":0.0,"cpu2.p_system":0.0,"cpu3.p_cpu":1.0,"cpu3.p_user":1.0,"cpu3.p_system":0.0}
2017-03-23 13:34:43 -0600 cpu_usage: {"cpu_p":1.75,"user_p":1.25,"system_p":0.5,"cpu0.p_cpu":3.0,"cpu0.p_user":3.0,"cpu0.p_system":0.0,"cpu1.p_cpu":2.0,"cpu1.p_user":2.0,"cpu1.p_system":0.0,"cpu2.p_cpu":0.0,"cpu2.p_user":0.0,"cpu2.p_system":0.0,"cpu3.p_cpu":1.0,"cpu3.p_user":0.0,"cpu3.p_system":1.0}
2017-03-23 13:34:44 -0600 cpu_usage: {"cpu_p":5.0,"user_p":3.25,"system_p":1.75,"cpu0.p_cpu":4.0,"cpu0.p_user":2.0,"cpu0.p_system":2.0,"cpu1.p_cpu":8.0,"cpu1.p_user":5.0,"cpu1.p_system":3.0,"cpu2.p_cpu":4.0,"cpu2.p_user":3.0,"cpu2.p_system":1.0,"cpu3.p_cpu":4.0,"cpu3.p_user":2.0,"cpu3.p_system":2.0}

PostgreSQL

PostgreSQL is a very popular and versatile open source database management system that supports the SQL language and that is capable of storing both structured and unstructured data, such as JSON objects.

Given that Fluent Bit is designed to work with JSON objects, the pgsql output plugin allows users to send their data to a PostgreSQL database and store it using the JSONB type.

PostgreSQL 9.4 or higher is required.

Preliminary steps

According to the parameters you have set in the configuration file, the plugin will create the table defined by the table option in the database defined by the database option hosted on the server defined by the host option. It will use the PostgreSQL user defined by the user option, which needs to have the right privileges to create such a table in that database.

NOTE: If you are not familiar with how PostgreSQL's users and grants system works, you might find useful reading the recommended links in the "References" section at the bottom.

A typical installation normally consists of a self-contained database for Fluent Bit in which you can store the output of one or more pipelines. Ultimately, it is your choice to to store them in the same table, or in separate tables, or even in separate databases based on several factors, including workload, scalability, data protection and security.

In this example, for the sake of simplicity, we use a single table called fluentbit in a database called fluentbit that is owned by the user fluentbit. Feel free to use different names. Preferably, for security reasons, do not use the postgres user (which has SUPERUSER privileges).

Create the `fluentbit` user

Generate a robust random password (e.g. pwgen 20 1) and store it safely. Then, as postgres system user on the server where PostgreSQL is installed, execute:

createuser -P fluentbit

At the prompt, please provide the password that you previously generated.

As a result, the user fluentbit without superuser privileges will be created.

If you prefer, instead of the createuser application, you can directly use the SQL command CREATE USER.

Create the `fluentbit` database

As postgres system user, please run:

createdb -O fluentbit fluentbit

This will create a database called fluentbit owned by the fluentbit user. As a result, the fluentbit user will be able to safely create the data table.

Alternatively, you can use the SQL command CREATE DATABASE.

Connection

Make sure that the fluentbit user can connect to the fluentbit database on the specified target host. This might require you to properly configure the pg_hba.conf file.

Configuration Parameters

Libpq

Fluent Bit relies on libpq, the PostgreSQL native client API, written in C language. For this reason, default values might be affected by environment variables and compilation settings. The above table, in brackets, list the most common default values for each connection option.

For security reasons, it is advised to follow the directives included in the password file section.

Configuration Example

In your main configuration file add the following section:

[OUTPUT]
    Name          pgsql
    Match         *
    Host          172.17.0.2
    Port          5432
    User          fluentbit
    Password      YourCrazySecurePassword
    Database      fluentbit
    Table         fluentbit
    Timestamp_Key ts

The output table

The output plugin automatically creates a table with the name specified by the table configuration option and made up of the following fields:

tag TEXT
time TIMESTAMP WITHOUT TIMEZONE
data JSONB

As you can see, the timestamp does not contain any information about the time zone and it is therefore referred to the time zone used by the connection to PostgreSQL (timezone setting).

For more information on the JSONB data type in PostgreSQL, please refer to the JSON types page in the official documentation, where you can find instructions on how to index or query the objects (including jsonpath introduced in PostgreSQL 12).

Scalability

PostgreSQL 10 introduces support for declarative partitioning. In order to improve vertical scalability of the database, you can decide to partition your tables on time ranges (for example on a monthly basis). PostgreSQL supports also subpartitions, allowing you to even partition by hash your records (version 11+), and default partitions (version 11+).

For more information on horizontal partitioning in PostgreSQL, please refer to the Table partitioning page in the official documentation.

If you are starting now, our recommendation at the moment is to choose the latest major version of PostgreSQL.

There's more ...

PostgreSQL is a really powerful and extensible database engine. More expert users can indeed take advantage of BEFORE INSERT triggers on the main table and re-route records on normalised tables, depending on tags and content of the actual JSON objects.

For example, you can use Fluent Bit to send HTTP log records to the landing table defined in the configuration file. This table contains a BEFORE INSERT trigger (a function in plpgsql language) that normalises the content of the JSON object and that inserts the record in another table (with its own structure and partitioning model). This kind of triggers allow you to discard the record from the landing table by returning NULL.

References

Here follows a list of useful resources from the PostgreSQL documentation: