StatsD & Graphite: Using StatsD & Graphite in OpsDash

The OpsDash Smart Agent can accept metrics via StatsD and Graphite interfaces. It contains  StatsD and Graphite daemons that can be enabled and configured.

Why use StatsD?

StatsD provides an easy way to create custom metrics that can be monitored and alerted on easily in OpsDash. It's great for tracking important app metrics and is easy to setup and use. Typically, your app will use a StatsD client library (see a  big list here), configured to talk to the OpsDash Smart Agent, listening at 127.0.0.1:8125 for StatsD connections. The agent will then aggregate and forward the metrics to the OpsDash server. 

With OpsDash, you can easily setup custom dashboards with graphs containing these metrics. You can do post-facto calculations (like rate of change, min, max, average, sum, rollup to minutes, hours, days etc) using the graphs. You can alert on these metrics in OpsDash in the same way as you would alert on any other metric in just a few clicks. Want to keep an eye on the time that a SQL query takes? Add the metric to an OpsDash, custom dashboard and set your alert threshold. Now you can be sure you'll know if anything is on it's way out of bounds and should be looked into before it creates real problems. Here are some more examples of metrics that are well-suited to be monitored like this:

  • Time taken for a particular SQL query
  • Page load / render time for particular pages
  • Whether a cron job for database backup was successful or not, and the time it took
  • The number of files / bytes uploaded / downloaded per hour / day

StatsD is fairly popular, and you can easily find lots of examples, tutorials and client libraries.

StatsD

The StatsD daemon is built into each agent, and can be enabled in the agent configuration. Uncomment/edit the lines in /etc/opsdash/agent.cfg to make sure it looks like this:

statsd {
        # Set to 1 to enable the built-in daemon.
	enabled = 1

        # The interface and port that the built-in daemon listens on.
        # 8125 is the standard StatsD port.
	bind.udp = "127.0.0.1:8125"
	bind.tcp = "127.0.0.1:8125"

	# For timing metrics, percentiles are computed as per this list. Values
	# must be comma-separated integers in ascending order. Default value is
	# "90,95,99".
	percentiles = "90,95,99"
}
		

You'll need to restart the agent for changes to take effect:

sudo service opsdash-agent restart
		

OpsDash's StatsD aggregator supports the standard metric types:

  • Counters, with optional sample rate
  • Timers, with optional sample rate. As an extension, floating-point values are also accepted.
  • Gauges
  • Sets

For timers, OpsDash calculates the mean, min, max, count and percentile thresholds. For a timer with bucket name  T, the names of the metrics stored are T.meanT.lower (the min value), T.upper (the max value), T.count (the count taking into account the sample rate) and for each configured percentile P, the threshold as T.upper_P.

  • Multiple reports in the same packet, separated by newlines, are accepted by both the TCP and UDP interfaces.
  • We recommend the usage of UDP interface over TCP if possible. This keeps the load on both the agent and the StatsD client machines low.
  • StatsD metrics will be reported into the same source as the server. For example, if your server has the hostname "node42", the StatsD metrics reported by the agent running on this node will be under "node42".
  • The flush interval is same as the value for "interval" in the /etc/opsdash/agent.cfg, which is 50 seconds by default.

Graphite

The Graphite daemon is enabled by default on each agent, and listens on the port 2003/tcp. Uncomment/edit the lines in the agent configuration file at /etc/opsdash/agent.cfg to make sure it looks like this:

graphite {
        # Set to 1 to enable the built-in daemon.
	enabled = 1

        # The interface and port to listen to. 2003 is the standard graphite port.
	bind.tcp = "127.0.0.1:2003"

	# Optionally, the agent can also listen on a UDP port that accepts the same
	# graphite text protocol as the TCP port. Uncomment to enable.
	#bind.udp = "127.0.0.1:2003"
}<br>
			

You'll need to restart the agent for changes to take effect:

sudo service opsdash-agent restart<br>
			

OpsDash accepts multiple lines separated by newlines. Only the plain-text protocol is supported currently.

As an extension, the agent can accept the Graphite plain-text protocol over a UDP port also. The UDP version uses less resources than the TCP one.

  • Graphite metrics will be reported into the same source as the server. For example, if your server has the hostname "node42", the Graphite metrics reported by the agent running on this node will be under "node42".
  • The flush interval is same as the value for "interval" in the /etc/opsdash/agent.cfg, which is 50 seconds by default.