Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Performance and Data Safety
When Fluent Bit processes data, it uses the system memory (heap) as a primary and temporal place to store the record logs before they get delivered, on this private memory area the records are processed.
Buffering refers to the ability to store the records somewhere, and while they are processed and delivered, still be able to store more. Buffering in memory is the fastest mechanism, but there are certain scenarios where the mechanism requires special strategies to deal with backpressure, data safety or reduce memory consumption by the service in constraint environments.
Network failures or latency on third party service is pretty common, and on scenarios where we cannot deliver data fast enough as we receive new data to process, we likely will face backpressure.
Our buffering strategies are designed to solve problems associated with backpressure and general delivery failures.
Fluent Bit as buffering strategies, offers a primary buffering mechanism in memory and an optional secondary one using the file system. With this hybrid solution you can adjust to any use case safety and keep a high performance while processing your data.
Both mechanisms are not exclusive and when the data is ready to be processed or delivered it will be always in memory, while other data in the queue might be in the file system until is ready to be processed and moved up to memory.
To learn more about the buffering configuration in Fluent Bit, please jump to the Buffering & Storage section.
There are a few key concepts that are really important to understand how Fluent Bit operates.
Before diving into Fluent Bit it’s good to get acquainted with some of the key concepts of the service. This document provides a gentle introduction to those concepts and common Fluent Bit terminology. We’ve provided a list below of all the terms we’ll cover, but we recommend reading this document from start to finish to gain a more general understanding of our log and stream processor.
Event or Record
Filtering
Tag
Timestamp
Match
Structured Message
Every incoming piece of data that belongs to a log or a metric that is retrieved by Fluent Bit is considered an Event or a Record.
As an example consider the following content of a Syslog file:
It contains four lines and each of them represents an independent Event, for a total of four Events.
Internally, an Event always has two components (in an array form):
Some cases require modifications on the Events content. Modifying events by altering, enriching or dropping Events is called Filtering.
There are many use cases when Filtering is required like:
Append specific information to the Event like an IP address or metadata.
Select a specific piece of the Event content.
Drop Events that matches certain pattern.
Every Event that gets into Fluent Bit gets assigned a Tag. This tag is an internal string that is used in a later stage by the Router to decide which Filter or Output phase it must go through.
Most of the tags are assigned manually in the configuration. If a tag is not specified, Fluent Bit will assign the name of the Input plugin instance from where that Event was generated from.
The only input plugin that doesn't assign Tags is Forward input. This plugin speaks the Fluentd wire protocol called Forward where every Event already comes with a Tag associated. Fluent Bit will always use the incoming Tag set by the client.
A Tagged record must always have a Matching rule. To learn more about Tags and Matches check the Routing section.
The Timestamp represents the time when an Event was created. Every Event contains a Timestamp associated. The Timestamp is a numeric fractional integer in the format:
It is the number of seconds that have elapsed since the Unix epoch.
Fractional second or one thousand-millionth of a second.
A timestamp always exists, either set by the Input plugin or discovered through a data parsing process.
Fluent Bit allows to deliver your collected and processed Events to one or multiple destinations, this is done through a routing phase. A Match represent a simple rule to select Events where it Tags matches a defined rule.
To learn more about Tags and Matches check the Routing section.
Source events can have or not have a structure. A structure defines a set of keys and values inside the Event message. As an example consider the following two messages:
At a low level both are just an array of bytes, but the Structured message defines keys and values, having a structure helps to implement faster operations on data modifications.
Fluent Bit always handles every Event message as a structured message. For performance reasons, we use a binary serialization data format called MessagePack.
Consider MessagePack as a binary version of JSON on steroids.
Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
Fluent Bit is an open source and multi-platform log processor tool which aims to be a generic Swiss knife for logs processing and distribution.
Nowadays the number of sources of information in our environments is ever increasing. Handling data collection at scale is complex, and collecting and aggregating diverse data requires a specialized tool that can deal with:
Different sources of information
Different data formats
Data Reliability
Security
Flexible Routing
Multiple destinations
Fluent Bit has been designed with performance and low resources consumption in mind.
The Production Grade Ecosystem
Logging and data processing in general can be complex, and substantially more so at scale. That's why Fluentd was born. Now, Fluentd is more than a simple tool. It's a full ecosystem that contains SDKs for different languages and sub projects like Fluent Bit.
On this page, we will describe the relationship between the Fluentd and Fluent Bit open source projects. As a summary, we can say both are:
Licensed under the terms of Apache License v2.0
Hosted projects by the Cloud Native Computing Foundation (CNCF)
Production Grade solutions: deployed thousands of times every single day, millions per month.
Community driven projects
Widely Adopted by the Industry: trusted by all major companies like AWS, Microsoft, Google Cloud and hundred of others.
Originally created by Treasure Data.
Both projects have a lot of similarities. Fluent Bit is fully designed and built on top of the best ideas of the Fluentd architecture and general design. Choosing which one to use depends on the end-user needs.
The following table describes a comparison in different areas of the projects:
Both Fluentd and Fluent Bit can work as Aggregators or Forwarders. They can both complement each other or be used as standalone solutions.
Data processing with reliability
Previously defined in the concept section, the buffer
phase in the pipeline aims to provide a unified and persistent mechanism to store your data, either using the primary in-memory model or using the filesystem based mode.
The buffer
phase already contains the data in an immutable state, meaning, no other filter can be applied.
Note that buffered data is not raw text, it's in Fluent Bit's internal binary representation.
Fluent Bit offers a buffering mechanism in the file system that acts as a backup system to avoid data loss in case of system failures.