Skip to content

Incorrect values of disk_read_bytes and disk_write_bytes recorded in Linux #17511

@apurbagarwal-ag

Description

@apurbagarwal-ag

Relevant telegraf.conf

[agent]
  flush_interval = "60s"
  flush_jitter = "10s"
  collection_jitter = "10s"
  interval = "1m"
  metric_buffer_limit = 100000

[[outputs.influxdb_v2]]
  urls = // hidden
  bucket = // hidden
  token = // hidden
  organization = // hidden

# Read metrics about memory usage
[[inputs.mem]]

# Read metrics about disk IO by device
[[inputs.diskio]]

# Read metrics about cpu usage
[[inputs.cpu]]

# Read metrics about system load & uptime
[[inputs.system]]

# Read metrics about disk usage by mount point
[[inputs.disk]]
  interval = "10m"

# Get kernel statistics from /proc/vmstat
[[inputs.kernel_vmstat]]
  # no configuration

# Gathers huge pages measurements.
[[inputs.hugepages]]
# capturing only per_node stats as meminfo and root hugepage stats
# can be calculated from the per_numa stats if needed
types = ["per_node"]

# Hacky way to update procstat config by running a cmd
[[inputs.exec]]
  commands = ["python3 /etc/telegraf/scripts/update_procstat_config.py > /dev/null"]
  timeout = "1m"
[[inputs.net]]
[[inputs.nfsclient]]
  interval = "10m"

Logs from Telegraf

No relevant information is present in the logs.

System info

Red Hat Enterprise Linux 9.6 (Plow), Telegraf 1.33.0

Docker

No response

Steps to reproduce

We are recording multiple process statistics. However the values of disk_read_bytes and disk_write_bytes are recorded incorrectly.
Eg. proc stats.

[aagarwal@ts-mum1-dsr54 ~]$ cat /proc/196649/io
rchar: 4508
wchar: 536
syscr: 10
syscw: 67
read_bytes: 12288
write_bytes: 0
cancelled_write_bytes: 0

Expected behavior

disk_read_bytes: 12288
disk_write_bytes: 0

Actual behavior

disk_read_bytes: 4508
disk_write_bytes: 536

Additional info

There was a change in gopsutil last year. This changed the following mapping from proc io file.

read_bytes : ReadBytes -> read_bytes : DiskReadBytes
rchar : Nothing -> rchar : ReadBytes
// similarly for writes

However from what I can see in telegraf source code, this corresponding change should also be done in plugins/inputs/procstat/process.go

	io, err := p.IOCounters()
	if err == nil {
		fields[prefix+"read_count"] = io.ReadCount
		fields[prefix+"write_count"] = io.WriteCount
		fields[prefix+"read_bytes"] = io.ReadBytes
		fields[prefix+"write_bytes"] = io.WriteBytes
	}

	// Linux fixup for gopsutils exposing the disk-only-IO instead of the total
	// I/O as for example on Windows
	if rc, wc, err := collectTotalReadWrite(p); err == nil {
		fields[prefix+"read_bytes"] = rc
		fields[prefix+"write_bytes"] = wc
		fields[prefix+"disk_read_bytes"] = io.ReadBytes
		fields[prefix+"disk_write_bytes"] = io.WriteBytes
	}

Any help is much appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugunexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions