Interview with Canonical

Continuing our series of interviews with users of Prometheus, Canonical talks about how they are transitioning to Prometheus.

Can you tell us about yourself and what Canonical does?

Canonical is probably best known as the company that sponsors Ubuntu Linux. We also produce or contribute to a number of other open-source projects including MAAS, Juju, and OpenStack, and provide commercial support for these products. Ubuntu powers the majority of OpenStack deployments, with 55% of production clouds and 58% of large cloud deployments.

My group, BootStack, is our fully managed private cloud service. We build and operate OpenStack clouds for Canonical customers.

What was your pre-Prometheus monitoring experience?

We’d used a combination of Nagios, Graphite/statsd, and in-house Django apps. These did not offer us the level of flexibility and reporting that we need in both our internal and customer cloud environments.

Why did you decide to look at Prometheus?

We’d evaluated a few alternatives, including InfluxDB and extending our use of Graphite, but our first experiences with Prometheus proved it to have the combination of simplicity and power that we were looking for. We especially appreciate the convenience of labels, the simple HTTP protocol, and the out of box timeseries alerting. The potential with Prometheus to replace 2 different tools (alerting and trending) with one is particularly appealing.

Also, several of our staff have prior experience with Borgmon from their time at Google which greatly added to our interest!

How did you transition?

We are still in the process of transitioning, we expect this will take some time due to the number of custom checks we currently use in our existing systems that will need to be re-implemented in Prometheus. The most useful resource has been the prometheus.io site documentation.

It took us a while to choose an exporter. We originally went with collectd but ran into limitations with this. We’re working on writing an openstack-exporter now and were a bit surprised to find there is no good, working, example how to write exporter from scratch.

Some challenges we’ve run into are: No downsampling support, no long term storage solution (yet), and we were surprised by the default 2 week retention period. There's currently no tie-in with Juju, but we’re working on it!

What improvements have you seen since switching?

Once we got the hang of exporters, we found they were very easy to write and have given us very useful metrics. For example we are developing an openstack-exporter for our cloud environments. We’ve also seen very quick cross-team adoption from our DevOps and WebOps groups and developers. We don’t yet have alerting in place but expect to see a lot more to come once we get to this phase of the transition.

What do you think the future holds for Canonical and Prometheus?

We expect Prometheus to be a significant part of our monitoring and reporting infrastructure, providing the metrics gathering and storage for numerous current and future systems. We see it potentially replacing Nagios as for alerting.