# GPU metrics

{% hint style="info" %}
**Supported event types:** `metrics`
{% endhint %}

The *gpu\_metrics* input plugin collects graphics processing unit (GPU) performance metrics from graphics cards on Linux systems. It provides real-time monitoring of GPU utilization, memory usage (VRAM), clock frequencies, power consumption, temperature, and fan speeds.

The plugin reads metrics directly from the Linux `sysfs` filesystem (`/sys/class/drm/`) without requiring external tools or libraries. Only AMD GPUs are supported through the `amdgpu` kernel driver. NVIDIA and Intel GPUs aren't supported.

## Metrics collected

The plugin collects the following metrics for each detected GPU:

| Key                       | Description                                                                                                                              |
| ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `gpu_utilization_percent` | GPU core utilization as a percentage (`0` to `100`). Indicates how busy the GPU is when processing workloads.                            |
| `gpu_memory_used_bytes`   | Amount of video RAM (VRAM) currently in use, measured in bytes.                                                                          |
| `gpu_memory_total_bytes`  | Total video RAM (VRAM) capacity available on the GPU, measured in bytes.                                                                 |
| `gpu_clock_mhz`           | Current GPU clock frequency in MHz. This metric has multiple instances with different type labels (see [Clock metrics](#clock-metrics)). |
| `gpu_power_watts`         | Current power consumption in watts. Can be disabled with `enable_power` set to `false`.                                                  |
| `gpu_temperature_celsius` | GPU die temperature in degrees Celsius. Can be disabled with `enable_temperature` set to `false`.                                        |
| `gpu_fan_speed_rpm`       | Fan rotation speed in Revolutions per Minute (RPM).                                                                                      |
| `gpu_fan_pwm_percent`     | Fan PWM duty cycle as a percentage (0-100). Indicates fan intensity.                                                                     |

### Clock metrics

The `gpu_clock_mhz` metric is reported separately for three clock domains:

| Type       | Description                      |
| ---------- | -------------------------------- |
| `graphics` | GPU core/shader clock frequency. |
| `memory`   | VRAM clock frequency.            |
| `soc`      | System-on-chip clock frequency.  |

## Configuration parameters

The plugin supports the following configuration parameters:

| Key                  | Description                                                                                                              | Default |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------ | ------- |
| `cards_exclude`      | Pattern specifying which GPU cards to exclude from monitoring. Uses the same syntax as `cards_include`.                  | *none*  |
| `cards_include`      | Pattern specifying which GPU cards to monitor. Supports wildcards (\*), ranges (0-3), and comma-separated lists (0,2,4). | `*`     |
| `enable_power`       | Enable collection of power consumption metrics (`gpu_power_watts`).                                                      | `true`  |
| `enable_temperature` | Enable collection of temperature metrics (`gpu_temperature_celsius`).                                                    | `true`  |
| `path_sysfs`         | Path to the `sysfs` root directory. Typically used for testing or non-standard systems.                                  | `/sys`  |
| `scrape_interval`    | Interval in seconds between metric collection cycles.                                                                    | `5`     |

## GPU detection

The GPU metrics plugin scans for any supported AMD GPU using the `amdgpu` kernel driver. Any GPU using legacy drivers is ignored.

To check if your AMD GPU will be detected run:

```shell
lspci | grep -i vga | grep -i amd
```

Example output:

```
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX/7900 GRE/7900M] (rev ce)
73:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Granite Ridge [Radeon Graphics] (rev c5)
```

### Multiple GPU systems

In systems with multiple GPUs, the GPU metrics plugin will detect all AMD cards by default. You can control which GPUs you want to monitor with the `cards_include` and `cards_exclude` parameters.

To list the GPUs running in your system run the following command:

```shell
ls /sys/class/drm/card*/device/vendor
```

Example output:

```
/sys/class/drm/card0/device/vendor
/sys/class/drm/card1/device/vendor
```

## Getting started

To get GPU metrics from your system, you can run the plugin from either the command line or through the configuration file:

### Command line

Run the following command from the command line:

```shell
fluent-bit -i gpu_metrics -o stdout
```

Example output:

```
2025-10-25T20:36:55.236905093Z gpu_utilization_percent{card="1",vendor="amd"} = 2
2025-10-25T20:36:55.237853918Z gpu_utilization_percent{card="0",vendor="amd"} = 0
2025-10-25T20:36:55.236905093Z gpu_memory_used_bytes{card="1",vendor="amd"} = 1580118016
2025-10-25T20:36:55.237853918Z gpu_memory_used_bytes{card="0",vendor="amd"} = 26083328
2025-10-25T20:36:55.236905093Z gpu_memory_total_bytes{card="1",vendor="amd"} = 17163091968
2025-10-25T20:36:55.237853918Z gpu_memory_total_bytes{card="0",vendor="amd"} = 2147483648
2025-10-25T20:36:55.236905093Z gpu_clock_mhz{card="1",vendor="amd",type="graphics"} = 45
2025-10-25T20:36:55.236905093Z gpu_clock_mhz{card="1",vendor="amd",type="memory"} = 96
2025-10-25T20:36:55.236905093Z gpu_clock_mhz{card="1",vendor="amd",type="soc"} = 500
2025-10-25T20:36:55.237853918Z gpu_clock_mhz{card="0",vendor="amd",type="graphics"} = 600
2025-10-25T20:36:55.237853918Z gpu_clock_mhz{card="0",vendor="amd",type="memory"} = 2800
2025-10-25T20:36:55.237853918Z gpu_clock_mhz{card="0",vendor="amd",type="soc"} = 1200
2025-10-25T20:36:55.236905093Z gpu_power_watts{card="1",vendor="amd"} = 28
2025-10-25T20:36:55.236905093Z gpu_temperature_celsius{card="1",vendor="amd"} = 28
2025-10-25T20:36:55.237853918Z gpu_temperature_celsius{card="0",vendor="amd"} = 39
2025-10-25T20:36:55.236905093Z gpu_fan_speed_rpm{card="1",vendor="amd"} = 0
2025-10-25T20:36:55.236905093Z gpu_fan_pwm_percent{card="1",vendor="amd"} = 0
```

### Configuration file

In your main configuration file append the following:

{% tabs %}
{% tab title="fluent-bit.yaml" %}

```yaml
pipeline:
  inputs:
    - name: gpu_metrics
      cards_exclude: "0"
      cards_include: "1"
      enable_power: true
      enable_temperature: true
      path_sysfs: /sys
      scrape_interval: 2

  outputs:
    - name: stdout
      match: '*'
```

{% endtab %}

{% tab title="fluent-bit.conf" %}

```
[INPUT]
  Name                gpu_metrics
  Cards_Exclude       0
  Cards_Include       1
  Enable_Power        true
  Enable_Temperature  true
  Path_Sysfs          /sys
  Scrape_Interval     2

[OUTPUT]
  Name   stdout
  Match  *
```

{% endtab %}
{% endtabs %}
