GPU metrics

The gpu_metrics input plugin collects graphics processing unit (GPU) performance metrics from graphics cards on Linux systems. It provides real-time monitoring of GPU utilization, memory usage (VRAM), clock frequencies, power consumption, temperature, and fan speeds.

The plugin reads metrics directly from the Linux sysfs filesystem (/sys/class/drm/) without requiring external tools or libraries. Only AMD GPUs are supported through the amdgpu kernel driver. NVIDIA and Intel GPUs aren't supported.

Metrics collected

The plugin collects the following metrics for each detected GPU:

Key

Description

gpu_utilization_percent

GPU core utilization as a percentage (0 to 100). Indicates how busy the GPU is when processing workloads.

gpu_memory_used_bytes

Amount of video RAM (VRAM) currently in use, measured in bytes.

gpu_memory_total_bytes

Total video RAM (VRAM) capacity available on the GPU, measured in bytes.

gpu_clock_mhz

Current GPU clock frequency in MHz. This metric has multiple instances with different type labels (see Clock metrics).

gpu_power_watts

Current power consumption in watts. Can be disabled with enable_power set to false.

gpu_temperature_celsius

GPU die temperature in degrees Celsius. Can be disabled with enable_temperature set to false.

gpu_fan_speed_rpm

Fan rotation speed in Revolutions per Minute (RPM).

gpu_fan_pwm_percent

Fan PWM duty cycle as a percentage (0-100). Indicates fan intensity.

Clock metrics

The gpu_clock_mhz metric is reported separately for three clock domains:

Type

Description

graphics

GPU core/shader clock frequency.

memory

VRAM clock frequency.

soc

System-on-chip clock frequency.

Configuration parameters

The plugin supports the following configuration parameters:

Key

Description

Default

cards_exclude

Pattern specifying which GPU cards to exclude from monitoring. Uses the same syntax as cards_include.

none

cards_include

Pattern specifying which GPU cards to monitor. Supports wildcards (*), ranges (0-3), and comma-separated lists (0,2,4).

*

enable_power

Enable collection of power consumption metrics (gpu_power_watts).

true

enable_temperature

Enable collection of temperature metrics (gpu_temperature_celsius).

true

path_sysfs

Path to the sysfs root directory. Typically used for testing or non-standard systems.

/sys

scrape_interval

Interval in seconds between metric collection cycles.

5

GPU detection

The GPU metrics plugin scans for any supported AMD GPU using the amdgpu kernel driver. Any GPU using legacy drivers is ignored.

To check if your AMD GPU will be detected run:

lspci | grep -i vga | grep -i amd

Example output:

03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX/7900 GRE/7900M] (rev ce)
73:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Granite Ridge [Radeon Graphics] (rev c5)

Multiple GPU systems

In systems with multiple GPUs, the GPU metrics plugin will detect all AMD cards by default. You can control which GPUs you want to monitor with the cards_include and cards_exclude parameters.

To list the GPUs running in your system run the following command:

ls /sys/class/drm/card*/device/vendor

Example output:

/sys/class/drm/card0/device/vendor
/sys/class/drm/card1/device/vendor

Getting started

To get GPU metrics from your system, you can run the plugin from either the command line or through the configuration file:

Command line

Run the following command from the command line:

fluent-bit -i gpu_metrics -o stdout

Example output:

2025-10-25T20:36:55.236905093Z gpu_utilization_percent{card="1",vendor="amd"} = 2
2025-10-25T20:36:55.237853918Z gpu_utilization_percent{card="0",vendor="amd"} = 0
2025-10-25T20:36:55.236905093Z gpu_memory_used_bytes{card="1",vendor="amd"} = 1580118016
2025-10-25T20:36:55.237853918Z gpu_memory_used_bytes{card="0",vendor="amd"} = 26083328
2025-10-25T20:36:55.236905093Z gpu_memory_total_bytes{card="1",vendor="amd"} = 17163091968
2025-10-25T20:36:55.237853918Z gpu_memory_total_bytes{card="0",vendor="amd"} = 2147483648
2025-10-25T20:36:55.236905093Z gpu_clock_mhz{card="1",vendor="amd",type="graphics"} = 45
2025-10-25T20:36:55.236905093Z gpu_clock_mhz{card="1",vendor="amd",type="memory"} = 96
2025-10-25T20:36:55.236905093Z gpu_clock_mhz{card="1",vendor="amd",type="soc"} = 500
2025-10-25T20:36:55.237853918Z gpu_clock_mhz{card="0",vendor="amd",type="graphics"} = 600
2025-10-25T20:36:55.237853918Z gpu_clock_mhz{card="0",vendor="amd",type="memory"} = 2800
2025-10-25T20:36:55.237853918Z gpu_clock_mhz{card="0",vendor="amd",type="soc"} = 1200
2025-10-25T20:36:55.236905093Z gpu_power_watts{card="1",vendor="amd"} = 28
2025-10-25T20:36:55.236905093Z gpu_temperature_celsius{card="1",vendor="amd"} = 28
2025-10-25T20:36:55.237853918Z gpu_temperature_celsius{card="0",vendor="amd"} = 39
2025-10-25T20:36:55.236905093Z gpu_fan_speed_rpm{card="1",vendor="amd"} = 0
2025-10-25T20:36:55.236905093Z gpu_fan_pwm_percent{card="1",vendor="amd"} = 0

Configuration file

In your main configuration file append the following:

pipeline:
  inputs:
    - name: gpu_metrics
      cards_exclude: "0"
      cards_include: "1"
      enable_power: true
      enable_temperature: true
      path_sysfs: /sys
      scrape_interval: 2

  outputs:
    - name: stdout
      match: '*'

[INPUT]
  Name                gpu_metrics
  Cards_Exclude       0
  Cards_Include       1
  Enable_Power        true
  Enable_Temperature  true
  Path_Sysfs          /sys
  Scrape_Interval     2

[OUTPUT]
  Name   stdout
  Match  *

PreviousForward NextHead

Last updated 3 months ago

Was this helpful?

hashtagMetrics collected

hashtagClock metrics

hashtagConfiguration parameters

hashtagGPU detection

hashtagMultiple GPU systems

hashtagGetting started

hashtagCommand line

hashtagConfiguration file

Metrics collected

Clock metrics

Configuration parameters

GPU detection

Multiple GPU systems

Getting started

Command line

Configuration file