Issue 140
When running a Kubernetes cluster, understanding the health of the services running on your clusters is job number one. Thanks to Google and their SRE handbook we have a pretty good idea of how to do this. So, without further ado let’s jump into some ways to measure health (or SLOs and SLIs).
SRE fundamentals 2021: SLIs vs SLAs vs SLOs
We will start with the experts on all things service level and get good definitions for the nuances and big difference between all these metrics. It can be a bit murky trying to understand and communicate the differences between these metrics so this is a great place to refer back to. ๐
Using observability tools to set SLOs for Kubernetes Applications
This practical guide dives into a few options and mental frameworks for thinking about your SLOs. It also gives a pretty good overview of Prometheus Grafana and even Jaeger (tracing) and how to use them for your service-level metrics. ๐
SRE Practices for Kubernetes Platforms
This quick article gives you a hit list of the metrics and things you should / could monitor as SLIs for your platform. This article really helped me wrap my head around where to start when planning out my SLIs and how to think about them. ๐ง
Setting SLOs: a step-by-step guide
We head back to the experts for this in-depth playbook for how to set up your own SLOs. This one gets pretty deep pretty fast and gives you a great way to think about setting your service level metrics and how to measure them.
A guide to setting up Kubernetes Service Level Objectives (SLOs) with Prometheus and Linkerd
This is a great writeup on implementing your SLOs and setting up your Prometheus dashboards. Coming from the folks over at Buoyant, itโs not surprising, but they still make a good case for using a service mesh when setting up your service level metrics.
Implementing SLI/SLO based Continuous Delivery Quality Gates using Prometheus
Knowing about problems before they make it to production is a path to happiness for SREs. In this writeup they explain how to use Keptn along with Prometheus to shift things left using Quality Gates. ๐ช
Tweet of the week
If youโre considering registering for the Contributor Summit at KubeCon, virtually or in person, register now so it doesn’t get cancelled!