Monitoring
Learn how to monitor your Fluent Bit data pipelines
Fluent Bit comes with built-it features to allow you to monitor the internals of your pipeline, connect to Prometheus and Grafana, Health checks and also connectors to use external services for such purposes:

HTTP Server

Fluent Bit comes with a built-in HTTP Server that can be used to query internal information and monitor metrics of each running plugin.
The monitoring interface can be easily integrated with Prometheus since we support it native format.

Getting Started

To get started, the first step is to enable the HTTP Server from the configuration file:
1
[SERVICE]
2
HTTP_Server On
3
HTTP_Listen 0.0.0.0
4
HTTP_PORT 2020
5
6
[INPUT]
7
Name cpu
8
9
[OUTPUT]
10
Name stdout
11
Match *
Copied!
the above configuration snippet will instruct Fluent Bit to start it HTTP Server on TCP Port 2020 and listening on all network interfaces:
1
$ bin/fluent-bit -c fluent-bit.conf
2
Fluent Bit v1.4.0
3
* Copyright (C) 2019-2020 The Fluent Bit Authors
4
* Copyright (C) 2015-2018 Treasure Data
5
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
6
* https://fluentbit.io
7
8
[2020/03/10 19:08:24] [ info] [engine] started
9
[2020/03/10 19:08:24] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
Copied!
now with a simple curl command is enough to gather some information:
1
$ curl -s http://127.0.0.1:2020 | jq
2
{
3
"fluent-bit": {
4
"version": "0.13.0",
5
"edition": "Community",
6
"flags": [
7
"FLB_HAVE_TLS",
8
"FLB_HAVE_METRICS",
9
"FLB_HAVE_SQLDB",
10
"FLB_HAVE_TRACE",
11
"FLB_HAVE_HTTP_SERVER",
12
"FLB_HAVE_FLUSH_LIBCO",
13
"FLB_HAVE_SYSTEMD",
14
"FLB_HAVE_VALGRIND",
15
"FLB_HAVE_FORK",
16
"FLB_HAVE_PROXY_GO",
17
"FLB_HAVE_REGEX",
18
"FLB_HAVE_C_TLS",
19
"FLB_HAVE_SETJMP",
20
"FLB_HAVE_ACCEPT4",
21
"FLB_HAVE_INOTIFY"
22
]
23
}
24
}
Copied!
Note that we are sending the curl command output to the jq program which helps to make the JSON data easy to read from the terminal. Fluent Bit don't aim to do JSON pretty-printing.

REST API Interface

Fluent Bit aims to expose useful interfaces for monitoring, as of Fluent Bit v0.14 the following end points are available:
URI
Description
Data Format
/
Fluent Bit build information
JSON
/api/v1/uptime
Get uptime information in seconds and human readable format
JSON
/api/v1/metrics
Internal metrics per loaded plugin
JSON
/api/v1/metrics/prometheus
Internal metrics per loaded plugin ready to be consumed by a Prometheus Server
Prometheus Text 0.0.4
/api/v1/storage
Get internal metrics of the storage layer / buffered data. This option is enabled only if in the SERVICE section the property storage.metrics has been enabled
JSON
/api/v1/health
Fluent Bit health check result
String

Uptime Example

Query the service uptime with the following command:
1
$ curl -s http://127.0.0.1:2020/api/v1/uptime | jq
Copied!
it should print a similar output like this:
1
{
2
"uptime_sec": 8950000,
3
"uptime_hr": "Fluent Bit has been running: 103 days, 14 hours, 6 minutes and 40 seconds"
4
}
Copied!

Metrics Examples

Query internal metrics in JSON format with the following command:
1
$ curl -s http://127.0.0.1:2020/api/v1/metrics | jq
Copied!
it should print a similar output like this:
1
{
2
"input": {
3
"cpu.0": {
4
"records": 8,
5
"bytes": 2536
6
}
7
},
8
"output": {
9
"stdout.0": {
10
"proc_records": 5,
11
"proc_bytes": 1585,
12
"errors": 0,
13
"retries": 0,
14
"retries_failed": 0
15
}
16
}
17
}
Copied!

Metrics in Prometheus format

Query internal metrics in Prometheus Text 0.0.4 format:
1
$ curl -s http://127.0.0.1:2020/api/v1/metrics/prometheus
Copied!
this time the same metrics will be in Prometheus format instead of JSON:
1
fluentbit_input_records_total{name="cpu.0"} 57 1509150350542
2
fluentbit_input_bytes_total{name="cpu.0"} 18069 1509150350542
3
fluentbit_output_proc_records_total{name="stdout.0"} 54 1509150350542
4
fluentbit_output_proc_bytes_total{name="stdout.0"} 17118 1509150350542
5
fluentbit_output_errors_total{name="stdout.0"} 0 1509150350542
6
fluentbit_output_retries_total{name="stdout.0"} 0 1509150350542
7
fluentbit_output_retries_failed_total{name="stdout.0"} 0 1509150350542
Copied!

Configuring Aliases

By default configured plugins on runtime get an internal name in the format plugin_name.ID. For monitoring purposes, this can be confusing if many plugins of the same type were configured. To make a distinction each configured input or output section can get an alias that will be used as the parent name for the metric.
The following example set an alias to the INPUT section which is using the CPU input plugin:
1
[SERVICE]
2
HTTP_Server On
3
HTTP_Listen 0.0.0.0
4
HTTP_PORT 2020
5
6
[INPUT]
7
Name cpu
8
Alias server1_cpu
9
10
[OUTPUT]
11
Name stdout
12
Alias raw_output
13
Match *
Copied!
Now when querying the metrics we get the aliases in place instead of the plugin name:
1
{
2
"input": {
3
"server1_cpu": {
4
"records": 8,
5
"bytes": 2536
6
}
7
},
8
"output": {
9
"raw_output": {
10
"proc_records": 5,
11
"proc_bytes": 1585,
12
"errors": 0,
13
"retries": 0,
14
"retries_failed": 0
15
}
16
}
17
}
Copied!

Grafana Dashboard and Alerts

Fluent Bit's exposed prometheus style metrics can be leveraged to create dashboards and alerts.
The provided example dashboard is heavily inspired by Banzai Cloud's logging operator dashboard but with a few key differences such as the use of the instance label (see why here), stacked graphs and a focus on Fluent Bit metrics.
dashboard

Alerts

Sample alerts are available here.

Health Check for Fluent Bit

Fluent bit now supports four new configs to set up the health check.
Config Name
Description
Default Value
Health_Check
enable Health check feature
Off
HC_Errors_Count
the error count to meet the unhealthy requirement
5
HC_Retry_Failure_Count
the retry failure count to meet the unhealthy requirement
5
HC_Period
The time period by second to count the error and retry failure data point
60
So the feature works as: Based on the HC_Period customer setup, if the real error number is over HC_Errors_Count or retry failure is over HC_Retry_Failure_Count, fluent bit will be considered as unhealthy. The health endpoint will return HTTP status 500 and String error. Otherwise it's healthy, will return HTTP status 200 and string ok
See the config example:
1
[SERVICE]
2
HTTP_Server On
3
HTTP_Listen 0.0.0.0
4
HTTP_PORT 2020
5
Health_Check On
6
HC_Errors_Count 5
7
HC_Retry_Failure_Count 5
8
HC_Period 5
9
10
[INPUT]
11
Name cpu
12
13
[OUTPUT]
14
Name stdout
15
Match *
Copied!
The command to call health endpoint
1
$ curl -s http://127.0.0.1:2020/api/v1/health
Copied!
Based on the fluent bit status, the result will be:
  • HTTP status 200 and "ok" in response to healthy status
  • HTTP status 500 and "error" in response for unhealthy status

Calyptia Cloud

Calyptia Cloud is a hosted service that allows you to monitor your Fluent Bit agents including data flow, metrics and configurations.

Get Started with Calyptia Cloud

Register your Fluent Bit agent will take less than one minute, steps:
In your Fluent Bit configuration file, append the following configuration section:
1
[CUSTOM]
2
name calyptia
3
api_key <YOUR_API_KEY>
Copied!
Make sure to replace your API key in the configuration. After a few seconds upon restart your Fluent Bit agent, the Calyptia Cloud Dashboard will list your agent. Metrics will take around 30 seconds to shows up.

Contact Calyptia

If want to get in touch with Calyptia team, just send an email to [email protected]
Last modified 2mo ago