Monitoring
Learn how to monitor your Fluent Bit data pipelines
Last updated
Learn how to monitor your Fluent Bit data pipelines
Last updated
Fluent Bit includes features for monitoring the internals of your pipeline, in addition to connecting to Prometheus and Grafana, Health checks, and connectors to use external services:
Fluent Bit includes an HTTP server for querying internal information and monitoring metrics of each running plugin.
You can integrate the monitoring interface with Prometheus.
To get started, enable the HTTP server from the configuration file. The following configuration instructs Fluent Bit to start an HTTP server on TCP port 2020
and listen on all network interfaces:
Apply the configuration file:
Fluent Bit starts and generates output in your terminal:
Use curl
to gather information about the HTTP server. The following command sends the command output to the jq
program, which outputs human-readable JSON data to the terminal.
Fluent Bit exposes the following endpoints for monitoring.
The following descriptions apply to v1 metric endpoints.
/api/v1/metrics/prometheus
endpoint
The following descriptions apply to metrics outputted in Prometheus format by the /api/v1/metrics/prometheus
endpoint.
The following terms are key to understanding how Fluent Bit processes metrics:
Record: a single message collected from a source, such as a single long line in a file.
Chunk: log records ingested and stored by Fluent Bit input plugin instances. A batch of records in a chunk are tracked together as a single unit.
The Fluent Bit engine attempts to fit records into chunks of at most 2 MB
, but the size can vary at runtime. Chunks are then sent to an output. An output plugin instance can either successfully send the full chunk to the destination and mark it as successful, or it can fail the chunk entirely if an unrecoverable error is encountered, or it can ask for the chunk to be retried.
/api/v1/storage
endpoint
The following descriptions apply to metrics outputted in JSON format by the /api/v1/storage
endpoint.
The following descriptions apply to v2 metric endpoints.
/api/v2/metrics/prometheus
or /api/v2/metrics
endpoint
The following descriptions apply to metrics outputted in Prometheus format by the /api/v2/metrics/prometheus
or /api/v2/metrics
endpoints.
The following terms are key to understanding how Fluent Bit processes metrics:
Record: a single message collected from a source, such as a single long line in a file.
Chunk: log records ingested and stored by Fluent Bit input plugin instances. A batch of records in a chunk are tracked together as a single unit.
The Fluent Bit engine attempts to fit records into chunks of at most 2 MB
, but the size can vary at runtime. Chunks are then sent to an output. An output plugin instance can either successfully send the full chunk to the destination and mark it as successful, or it can fail the chunk entirely if an unrecoverable error is encountered, or it can ask for the chunk to be retried.
Storage layer
The following are detailed descriptions for the metrics collected by the storage layer.
Query the service uptime with the following command:
The command prints a similar output like this:
Query internal metrics in JSON format with the following command:
The command prints a similar output like this:
Query internal metrics in Prometheus Text 0.0.4 format:
This command returns the same metrics in Prometheus format instead of JSON:
By default, configured plugins on runtime get an internal name in the format _plugin_name.ID_
. For monitoring purposes, this can be confusing if many plugins of the same type were configured. To make a distinction each configured input or output section can get an alias that will be used as the parent name for the metric.
The following example sets an alias to the INPUT
section of the configuration file, which is using the CPU input plugin:
When querying the related metrics, the aliases are returned instead of the plugin name:
You can create Grafana dashboards and alerts using Fluent Bit's exposed Prometheus style metrics.
The provided example dashboard is heavily inspired by Banzai Cloud's logging operator dashboard with a few key differences, such as the use of the instance
label, stacked graphs, and a focus on Fluent Bit metrics. See this blog post for more information.
Sample alerts are available here.
Fluent bit now supports four new configs to set up the health check.
Not every error log means an error to be counted. The error retry failures count only on specific errors, which is the example in configuration table description.
Based on the HC_Period
setting, if the real error number is over HC_Errors_Count
, or retry failure is over HC_Retry_Failure_Count
, Fluent Bit is considered unhealthy. The health endpoint returns an HTTP status 500
and an error
message. Otherwise, the endpoint returns HTTP status 200
and an ok
message.
The equation to calculate this behavior is:
The HC_Errors_Count
and HC_Retry_Failure_Count
only count for output plugins and count a sum for errors and retry failures from all running output plugins.
The following configuration file example shows how to define these settings:
Use the following command to call the health endpoint:
With the example config, the health status is determined by the following equation:
If this equation evaluates to TRUE
, then Fluent Bit is unhealthy.
If this equation evaluates to FALSE
, then Fluent Bit is healthy.
Telemetry Pipeline is a hosted service that allows you to monitor your Fluent Bit agents including data flow, metrics, and configurations.
URI | Description | Data format |
---|---|---|
Metric name | Labels | Description | Type | Unit |
---|---|---|---|---|
Metric Key | Description | Unit |
---|---|---|
Metric Name | Labels | Description | Type | Unit |
---|---|---|---|---|
Metric Name | Labels | Description | Type | Unit |
---|---|---|---|---|
Configuration name | Description | Default |
---|---|---|