Scaling Stories: Kubernetes 📈

Once you have a Kubernetes cluster, it’s fun to see what it can handle. This week we are going to take a look at some scaling stories from the CNCF ecosystem and get a feel for what people are accomplishing in the real world with K8s.

Issue #135

Spotify wrote their own Scheduler called Helios prior to switching to Kubernetes. This article goes through how and why they switched and the impact it had. This is a very interesting read.

This article gets pretty technical around the challenges of running Istio at scale at HelloFresh. The author dives into all kinds of topics and covers a great number of gotchas to watch out for. 👀

Shopify runs one of the largest Kubernetes teams that we’ve seen and they specifically have a whole lot of unique Ingress challenges to contend with. In this article, the team at Shopify talks through their decision to use NGINX and the story of their dynamic configuration they contributed to the project. ✅

The folks over at RedHat share with us some of their learnings from running large clusters at scale. TLDR; Get etcd right and be careful with your DaemonSets. ⚠️

This article talks through how three different teams (Bloomberg, News UK, and Amadeus) use Kubernetes and the challenges it’s solving for them. The article offers all kinds of protips around your setup and leveraging the different offerings from the different clouds and in house solutions teams have created. ☁️

The folks over at OpenAI have some pretty intense workloads to run and they are doing it with K8s. This in depth article goes through networking, logging, GPUs, health checks and a whole list of challenges and how they dealt with them. This is a read if you are looking to do ML workloads on k8s. 🤔

This was an interesting read from Ably that talks about the reasons they aren’t running Kubernetes today. Worth looking at when reading the success stories in today’s issues as a counterpoint.