Jun 28, 2019

Prometheus

Prometheus is a common monitoring system and is a good place to start looking into the details of monitoring because:

Prometheus is implemented in Go, so the code is easy to follow
The source is commented, not well, but enough
Prometheus includes both a host agent and a collector/aggregator

Let’s take a look at the Prometheus agent.

Agent

Prometheus has a metrics collection agent, written in ‘Go’, called node_exporter.

The code in node_exporter can be thought of as broken into 3 separate chunks: startup, metrics collection, and metrics serving.

(The code listings below have all but the nuts and bolts elided. Look at the original source via the provided links for full context.)

Startup

We enter into all Go code via main:

https://github.com/prometheus/node_exporter/blob/e972e38b423632a4f45326bd33eb127a4935f1f6/node_exporter.go#L131-L131

func main() {
	http.Handle(*metricsPath, newHandler(!*disableExporterMetrics, *maxRequests))
	if err := http.ListenAndServe(*listenAddress, nil); err != nil { log.Fatal(err) }

So main is constructing an httpHandler via newHandler and then handing that off to the http server. Does this mean that this collector is only collecting metrics upon receiving a web request to serve up those metrics? Hmmmm….

newHandler is essentially creating an innerHandler. Let’s ignore the code path where we pass in a custom set of collection filters in the query string to the server and stick with the default behaviour. In this case, newHandler creates an innerHandler here:

https://github.com/prometheus/node_exporter/blob/e972e38b423632a4f45326bd33eb127a4935f1f6/node_exporter.go#L42

func newHandler(includeExporterMetrics bool, maxRequests int) *handler {
	h := &handler{... }
	if innerHandler, err := h.innerHandler(); err != nil {...}
	return h
}

innerHandler is where all the meat is. This is the function that actually set’s up the metrics collectors:

https://github.com/prometheus/node_exporter/blob/e972e38b423632a4f45326bd33eb127a4935f1f6/node_exporter.go#L83-L83

// innerHandler is used to create buth the one unfiltered http.Handler to be
// wrapped by the outer handler and also the filtered handlers created on the
// fly. The former is accomplished by calling innerHandler without any arguments
// (in which case it will log all the collectors enabled via command-line
// flags).
func (h *handler) innerHandler(filters ...string) (http.Handler, error) {
	nc := collector.NewNodeCollector(filters...)
	r := prometheus.NewRegistry()
	r.MustRegister(version.NewCollector("node_exporter"))
	r.Register(nc)
	handler := promhttp.HandlerFor(
		prometheus.Gatherers{h.exporterMetricsRegistry, r},
		promhttp.HandlerOpts{
			ErrorLog:            log.NewErrorLogger(),
			ErrorHandling:       promhttp.ContinueOnError,
			MaxRequestsInFlight: h.maxRequests,
		},
	)
	return handler, nil
}

The key takeaway above is that the innerHandler() method is constructing a list of collectors, registering the collectors into a new registry, and then passing the registry into a constructor for prometheus.Gatherers, and returning a handler that wraps all of this goodness. Now we know how the application is started and how the handler is constructed. The next question is what happens when the handler is invoked? So on to the next section…

Collection

What is the prometheus.Gatherers all about and what is inside the handler returned by promhttp.HondlerFor()?

https://github.com/prometheus/node_exporter/blob/e972e38b423632a4f45326bd33eb127a4935f1f6/vendor/github.com/prometheus/client_golang/prometheus/promhttp/http.go#L80-L80

// HandlerFor returns an uninstrumented http.Handler for the provided Gatherer. 
func HandlerFor(reg prometheus.Gatherer, opts HandlerOpts) http.Handler {
	h := http.HandlerFunc(func(rsp http.ResponseWriter, req *http.Request) {
		mfs := reg.Gather()
		w := io.Writer(rsp)
		enc := expfmt.NewEncoder(w, contentType)
		for _, mf := range mfs {
			if err := enc.Encode(mf); err != nil {
}

So HandlerFor returns a handlerFunc that invokes Gather() and then writes the encoded data onto the response. Pretty straight forward.

What does Gather() do? Well the Gatherer interface implements the Gather() method. And this is ultimately the method that get’s invoked by the registry here:

https://github.com/prometheus/node_exporter/blob/e972e38b423632a4f45326bd33eb127a4935f1f6/vendor/github.com/prometheus/client_golang/prometheus/registry.go#L405-L405

// Gather implements Gatherer.
func (r *Registry) Gather() ([]*dto.MetricFamily, error) {
	r.mtx.RLock()
	checkedCollectors := make(chan Collector, len(r.collectorsByID))
	for _, collector := range r.collectorsByID {
		checkedCollectors <- collector
	}
	r.mtx.RUnlock()

	collectWorker := func() {
            collector := <-checkedCollectors:
            collector.Collect(checkedMetricChan)
	}

	// Start the first worker now to make sure at least one is running.
	go collectWorker()
	return internal.NormalizeMetricFamilies(metricFamiliesByName), errs.MaybeUnwrap()
}

The Gatherer.Gather() function iterates over all the available collectors and pushes them onto a channel. Then it fires up go routine(s) to invoke the Collector.Collect() method on these collectors. When complete, the function returns a bunch of metrics data.

What is the Collector interface all about you ask? Here it is:

https://github.com/prometheus/node_exporter/blob/e972e38b423632a4f45326bd33eb127a4935f1f6/vendor/github.com/prometheus/client_golang/prometheus/collector.go#L16-L16

// Collector is the interface implemented by anything that can be used by
// Prometheus to collect metrics. 
type Collector interface {
	// Collect is called by the Prometheus registry when collecting
	// metrics. The implementation sends each collected metric via the
	// provided channel and returns once the last metric has been sent. 
	Collect(chan<- Metric)
}

So that’s the code at a high level. Now let’s deep dive into one particular collector, the CPU collector, to see how these metrics are being pulled from the operating system.

CPU Metrics

The source code related to CPU metrics is organized under the collector directory, with files named as {METRIC_NAME}_{OPERATING_SYSTEM}.go.

Here’s a listing of the cpu metrics collector go source files:

~/repos/external/prom $ tree node_exporter/collector/ -L 1 | grep cpu_
├── cpu_common.go
├── cpu_darwin.go
├── cpu_dragonfly.go
├── cpu_dragonfly_test.go
├── cpu_freebsd.go
├── cpu_linux.go
├── cpu_openbsd.go
├── cpu_solaris.go

As you can see, there is a different source code file for each OS. Since I’m testing on Linux platforms, let’s look at cpu_linux.go.

https://github.com/prometheus/node_exporter/blob/7e684f16aecf8121e41b39fa45a3ab969e9b1cd1/collector/cpu_linux.go#L156

// updateStat reads /proc/stat through procfs and exports cpu related metrics.
func (c *cpuCollector) updateStat(ch chan<- prometheus.Metric) error {
	stats, err := c.fs.Stat()
	if err != nil {
		return err
}

Q: What’s this c.fs object?

https://github.com/prometheus/node_exporter/blob/e972e38b423632a4f45326bd33eb127a4935f1f6/vendor/github.com/prometheus/procfs/fs.go#L20

// FS represents the pseudo-filesystem sys, which provides an interface to
// kernel data structures.
type FS struct {
	proc fs.FS
}

Q: I’ve heard of sys and procfs… what are these again?

I have a whole writeup on Linux’s Procfs.

Prometheus' Procfs Library

Q: So what’s c.fs.Stat() doing?
A: It’s calling into a separate prometheus library called procfs.

Here’s what the prometheus procfs library is all about:

https://github.com/prometheus/procfs#usage

The procfs library is organized by packages based on whether the gathered data
is coming from /proc, /sys, or both. Each package contains an FS type which
represents the path to either /proc, /sys, or both. For example, current cpu
statistics are gathered from /proc/stat and are available via the root procfs
package.

Back to the call to stats, err := c.fs.Stat():

https://github.com/prometheus/procfs/blob/master/stat.go#L164

// Stat returns information about current cpu/process statistics.
// See https://www.kernel.org/doc/Documentation/filesystems/proc.txt
func (fs FS) Stat() (Stat, error) {

	f, err := os.Open(fs.proc.Path("stat"))
	if err != nil {
		return Stat{}, err
	}
	defer f.Close()

stat := Stat{}

Cool! Not only can we see that it’s opening a standard file to read from, it’s also well documented code! Let’s take a look at the referenced docs at kernel.org.

Procfs Library Usage

The rest of the code at procfs/stat.go is basically reading the from the file, parsing the data into lines, then splitting the lines by whitespace, and returning a struct with UINTs parsed from the strings. It’s not that interesting… frankly the only thing that is interesting about it is how straight forward it is:

https://github.com/prometheus/procfs/blob/39e1aff1547e9a628f5714cc5f73d058ba44258e/stat.go#L176

        // ----------------------------------------------
        // [ME] read from file
        // ----------------------------------------------
	scanner := bufio.NewScanner(f)
	for scanner.Scan() {
                // ----------------------------------------------
                // [ME] break file into lines
                // ----------------------------------------------
		line := scanner.Text()
                // ----------------------------------------------
                // [ME] break lines into parts based on white space separation
                // ----------------------------------------------
		parts := strings.Fields(scanner.Text())
		// require at least <key> <value>
		if len(parts) < 2 {
			continue
		}
		switch {
		case parts[0] == "btime":
                        // ----------------------------------------------
                        // [ME] parse string into UINT
                        // ----------------------------------------------
			if stat.BootTime, err = strconv.ParseUint(parts[1], 10, 64); err != nil {
				return Stat{}, fmt.Errorf("couldn't parse %s (btime): %s", parts[1], err)

                ...

This stat object is then returned to the caller and the results written to a go channel here:

https://github.com/prometheus/node_exporter/blob/7e684f16aecf8121e41b39fa45a3ab969e9b1cd1/collector/cpu_linux.go#L163

	for cpuID, cpuStat := range stats.CPU {
		cpuNum := fmt.Sprintf("%d", cpuID)
                ch <- prometheus.MustNewConstMetric(c.cpu, prometheus.CounterValue, cpuStat.User, cpuNum, "user")
                ...

This bit is a little confusing. Why is this code calling prometheus.MustNewConstMetric() and not one of the standard emitters such as a Counter or Guage? We know from the kernel.org docs that the CPU values are cumulative totals since the machine was last booted. We also know that the values never decrease. Perhaps it’s because these values are pass-through and don’t quite follow the semantics of a Counter or Guage? Frankly, I’m not sure why these are defined as MustNewConstMetric() and not NewConstMetric() or NewGauge…

Perhaps the documentation can help:

https://godoc.org/github.com/prometheus/client_golang/prometheus#hdr-Custom_Collectors_and_constant_Metrics

If you already have metrics available, created outside of the Prometheus
context, you don't need the interface of the various Metric types. You
essentially want to mirror the existing numbers into Prometheus Metrics during
collection. An own implementation of the Collector interface is perfect for
that.

Since cpu_linux is simply mirroring (pass-through, whatever) the metrics that the OS has already collected, then we can use these purpose-built metrics for them. Perhaps they are faster or more memory efficient?

https://github.com/prometheus/node_exporter/blob/e972e38b423632a4f45326bd33eb127a4935f1f6/vendor/github.com/prometheus/client_golang/prometheus/value.go#L79-L79

// NewConstMetric returns an error if the length of
// labelValues is not consistent with the variable labels in Desc or if Desc is
// invalid.

I suspect the reason to use MustNewConstMetric is a simpler code path while still checking for naming consistency of the emitted metrics.

Summary

Prometheus agent reading CPU stats, on Linux, via Procfs. The call chain goes something like this:

node_exporter 
    -> http.Handler 
        -> onRequest 
            -> Gather 
                -> Collect 
                    -> CPU_Collector 
                        -> Procfs 
                            -> read from the linux procfs pseudo file system
                                -> prometheus.MustNewConstMetric(...)

Now that we know the pattern, it’s fairly easy to deconstruct the rest of the collectors, as needed.