Topological data analysis
This processor applies Topological Data Analysis (TDA) to incoming metrics using a sliding window and Ripser persistent homology. It computes Betti numbers that characterize the topological shape of the metric signal over time, which can surface structural patterns (such as recurring cycles or anomalies) that traditional statistical methods miss.
The processor operates only on metrics. Log and trace records pass through unchanged.
Only YAML configuration files support processors.
How it works
On each flush, the processor:
Aggregates incoming metrics into a feature vector by collapsing each unique
(namespace, subsystem)pair into a single value. Counters are converted to log-scaled rates; gauges are used directly.Appends the feature vector to a sliding ring-buffer window of up to
window_sizesamples.Optionally applies delay embedding (controlled by
embed_dimandembed_delay) to reconstruct attractor geometry from the time series.Once the window holds at least
min_pointssamples, builds a pairwise Euclidean distance matrix over the embedded points and runsRipserto compute persistent homology.Scans across multiple distance thresholds (or uses the quantile supplied in
threshold) and emits the Betti numbers that show the strongest topological signal.
The output is three gauge metrics added to the same metrics context:
fluentbit_tda_betti0
Betti number β₀—number of connected components in the Vietoris-Rips complex.
fluentbit_tda_betti1
Betti number β₁—number of independent loops (1-cycles). Elevated values suggest cyclic or periodic patterns.
fluentbit_tda_betti2
Betti number β₂—number of enclosed voids (2-cycles).
Configuration parameters
window_size
Number of samples to keep in the sliding window.
60
min_points
Minimum number of samples that must be in the window before Ripser runs.
10
embed_dim
Delay embedding dimension m. Setting m=1 disables delay embedding and uses the raw feature vectors directly. For m>1, each point in the distance matrix is constructed from m consecutive lagged snapshots (for example, m=3 → x_t, x_{t-1}, x_{t-2}).
3
embed_delay
Lag τ in samples between successive delays in the embedding. Ignored when embed_dim=1.
1
threshold
Distance scale selector. 0 triggers an automatic multi-quantile scan that picks the threshold maximizing β₁ (or β₀ when all β₁ are zero). A value in (0, 1) is treated as a quantile of the pairwise distance distribution and used directly as the Ripser threshold.
0
Configuration example
The following example scrapes Prometheus metrics and runs TDA on the ingested data before forwarding to an OpenTelemetry endpoint:
To disable delay embedding and run TDA directly on the raw metric vectors, set embed_dim: 1:
To fix the distance threshold at a specific quantile of the pairwise distances (for example, the thirtieth percentile), set threshold to a value between 0 and 1:
Last updated
Was this helpful?