Jun 27, 2019

Monitoring : Part 2

In the previous post, Monitoring : Part 1 I showed off a monitoring solution that I created in order to examine the performance of various algorithms during test runs. However, I quickly noticed that the results of two different collectors, collectd and prometheus' node-exporter were inconsistent. From this I realized that I really needed to dive into the details of instrumentation, collection, aggregation and display. In this post, I’ll take a look at how Linux exposes host metrics and in particular how the Prometheus agent, called node_exporter, reads the Linux host metrics.

Overview

I’ve broken this post out into subsections to make this more digestable:

Once you have digested the contents of those two posts, the rest of this will be pretty straight forward.

Linux servers up kernel level information such as host stats via sys and procfs
The Prometheus agent slurps up those metrics and parses them by parsing the text in the pseudo file /proc/stats
The Prometheus agent then serves those metrics via a url as : http://localhost:[PORT]/metrics
The Prometheus collector/aggregator makes an http GET request to that url to retrieve the metrics

Example

The serving part was essentially covered in 1 and 2 where we showed the metrics getting written to the response. In order to see this in action, fire up the node_exporter and curl the listed url to get a metrics report.

First, build the node_exporter…

$ cd node_exporter
$ make
...

Now, start the node_exporter…

 $ ./node_exporter
INFO[0000] Starting node_exporter (version=0.18.1, branch=master, revision=e972e38b423632a4f45326bd33eb127a4935f1f6)  source="node_exporter.go:156"
INFO[0000] Build context (go=go1.12.6, user=toddg@dino, date=20190628-15:55:33)  source="node_exporter.go:157"
INFO[0000] Enabled collectors:                           source="node_exporter.go:97"
INFO[0000]  - arp                                        source="node_exporter.go:104"
INFO[0000]  - bcache                                     source="node_exporter.go:104"
INFO[0000]  - bonding                                    source="node_exporter.go:104"
INFO[0000]  - conntrack                                  source="node_exporter.go:104"
INFO[0000]  - cpu                                        source="node_exporter.go:104"
INFO[0000]  - cpufreq                                    source="node_exporter.go:104"
INFO[0000]  - diskstats                                  source="node_exporter.go:104"
INFO[0000]  - edac                                       source="node_exporter.go:104"
INFO[0000]  - entropy                                    source="node_exporter.go:104"
INFO[0000]  - filefd                                     source="node_exporter.go:104"
INFO[0000]  - filesystem                                 source="node_exporter.go:104"
INFO[0000]  - hwmon                                      source="node_exporter.go:104"
INFO[0000]  - infiniband                                 source="node_exporter.go:104"
INFO[0000]  - ipvs                                       source="node_exporter.go:104"
INFO[0000]  - loadavg                                    source="node_exporter.go:104"
INFO[0000]  - mdadm                                      source="node_exporter.go:104"
INFO[0000]  - meminfo                                    source="node_exporter.go:104"
INFO[0000]  - netclass                                   source="node_exporter.go:104"
INFO[0000]  - netdev                                     source="node_exporter.go:104"
INFO[0000]  - netstat                                    source="node_exporter.go:104"
INFO[0000]  - nfs                                        source="node_exporter.go:104"
INFO[0000]  - nfsd                                       source="node_exporter.go:104"
INFO[0000]  - pressure                                   source="node_exporter.go:104"
INFO[0000]  - sockstat                                   source="node_exporter.go:104"
INFO[0000]  - stat                                       source="node_exporter.go:104"
INFO[0000]  - textfile                                   source="node_exporter.go:104"
INFO[0000]  - time                                       source="node_exporter.go:104"
INFO[0000]  - timex                                      source="node_exporter.go:104"
INFO[0000]  - uname                                      source="node_exporter.go:104"
INFO[0000]  - vmstat                                     source="node_exporter.go:104"
INFO[0000]  - xfs                                        source="node_exporter.go:104"
INFO[0000]  - zfs                                        source="node_exporter.go:104"
INFO[0000] Listening on :9100                            source="node_exporter.go:170"

Try curling localhost:9100…

$ curl localhost:9100
<html>
			<head><title>Node Exporter</title></head>
			<body>
			<h1>Node Exporter</h1>
			<p><a href="/metrics">Metrics</a></p>
			</body>
			</html>~/repos/external/prom/node_exporter

That’s exactly what we saw in the code. Perfect. Append metrics to the url and curl again…

$ curl localhost:9100/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.12.6"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 906112
...

There’s a lot of output generated here, so see this gist for a full sample output.

Verification

Does all this work as expected? Let’s take a top level view, knowing what we know now:

$ curl localhost:9100/metrics | grep cpu | grep idle; cat /proc/stat

node_cpu_seconds_total{cpu="0",mode="idle"} 173053.7
node_cpu_seconds_total{cpu="1",mode="idle"} 60628.78
node_cpu_seconds_total{cpu="2",mode="idle"} 60319.59
node_cpu_seconds_total{cpu="3",mode="idle"} 60668.45
node_cpu_seconds_total{cpu="4",mode="idle"} 60367.33
node_cpu_seconds_total{cpu="5",mode="idle"} 60750.17
node_cpu_seconds_total{cpu="6",mode="idle"} 60363.55
node_cpu_seconds_total{cpu="7",mode="idle"} 60724.35

// from the docs, 'idle' is the 4th column reported by /proc/stat

cpu  1769022 21772 573346 59687683 52777 0 16659 0 0 0
cpu0 261911 2143 80436 17305381 16340 0 9367 0 0 0
cpu1 175662 1669 54010 6062890 9968 0 1779 0 0 0
cpu2 269207 3138 82611 6031970 7467 0 778 0 0 0
cpu3 172678 2143 60796 6066852 4373 0 208 0 0 0
cpu4 269848 2975 82810 6036745 3139 0 181 0 0 0
cpu5 175312 2104 60435 6075029 2152 0 386 0 0 0
cpu6 270884 4214 96732 6036367 5641 0 672 0 0 0
cpu7 173518 3383 55513 6072446 3693 0 3286 0 0 0

Let’s compare the results… The docs also say that “““Time units are in USER_HZ (typically hundredths of a second)””” so normalizing by X/100 gives us:

idle            idle          idle 
localhost:      cat           cat
/metrics        /proc/stat    /proc/stat
(seconds)       (USER_HZ)     (normalized to seconds) X/100
-----------------------------------------------
173053.7      17305381        173053.8
 60628.78      6062890         60628.90
 60319.59      6031970         60319.70
 60668.45      6066852         60668.52
 60367.33      6036745         60367.45
 60750.17      6075029         60750.29
 60363.55      6036367         60363.67
 60724.35      6072446         60724.46

So things look good down to the second. There seems to be a difference in rounding at the subsecond level.

Summary

The Prometheus node_exporter is doing the exact same thing I was doing by hand… cat’ing out a pseduo file and scraping the results. Only node_exporter also exposes the metrics via a URL.