Topological data analysis

This processor applies Topological Data Analysisarrow-up-right (TDA) to incoming metrics using a sliding window and Ripser persistent homology. It computes Betti numbers that characterize the topological shape of the metric signal over time, which can surface structural patterns (such as recurring cycles or anomalies) that traditional statistical methods miss.

The processor operates only on metrics. Log and trace records pass through unchanged.

circle-info

Only YAML configuration files support processors.

How it works

On each flush, the processor:

  1. Aggregates incoming metrics into a feature vector by collapsing each unique (namespace, subsystem) pair into a single value. Counters are converted to log-scaled rates; gauges are used directly.

  2. Appends the feature vector to a sliding ring-buffer window of up to window_size samples.

  3. Optionally applies delay embedding (controlled by embed_dim and embed_delay) to reconstruct attractor geometry from the time series.

  4. Once the window holds at least min_points samples, builds a pairwise Euclidean distance matrix over the embedded points and runs Ripser to compute persistent homology.

  5. Scans across multiple distance thresholds (or uses the quantile supplied in threshold) and emits the Betti numbers that show the strongest topological signal.

The output is three gauge metrics added to the same metrics context:

Metric
Description

fluentbit_tda_betti0

Betti number β₀—number of connected components in the Vietoris-Rips complex.

fluentbit_tda_betti1

Betti number β₁—number of independent loops (1-cycles). Elevated values suggest cyclic or periodic patterns.

fluentbit_tda_betti2

Betti number β₂—number of enclosed voids (2-cycles).

Configuration parameters

Key
Description
Default

window_size

Number of samples to keep in the sliding window.

60

min_points

Minimum number of samples that must be in the window before Ripser runs.

10

embed_dim

Delay embedding dimension m. Setting m=1 disables delay embedding and uses the raw feature vectors directly. For m>1, each point in the distance matrix is constructed from m consecutive lagged snapshots (for example, m=3x_t, x_{t-1}, x_{t-2}).

3

embed_delay

Lag τ in samples between successive delays in the embedding. Ignored when embed_dim=1.

1

threshold

Distance scale selector. 0 triggers an automatic multi-quantile scan that picks the threshold maximizing β₁ (or β₀ when all β₁ are zero). A value in (0, 1) is treated as a quantile of the pairwise distance distribution and used directly as the Ripser threshold.

0

Configuration example

The following example scrapes Prometheus metrics and runs TDA on the ingested data before forwarding to an OpenTelemetry endpoint:

To disable delay embedding and run TDA directly on the raw metric vectors, set embed_dim: 1:

To fix the distance threshold at a specific quantile of the pairwise distances (for example, the thirtieth percentile), set threshold to a value between 0 and 1:

Last updated

Was this helpful?