Patching Kubernetes manifests based on prometheus CPU and Memory usage metrics using Spinnaker pipelines

Share

In Part 1 of this blog series, I explained how to automate increasing the limits/requests for a pod’s memory/cpu using the ‘webhook alerts’ feature of alert manager and ‘webhook triggered pipelines’ of spinnaker.

In Part 2, we explained how to use verticalpodautoscaler (VPA) to get the recommendations for pod memory and CPU requests as well as limits.

In part 3 of this series, I’ll explain how to use prometheus metrics to tune pod memory and CPU. In parts 1 and 2, we used pods with constant memory and cpu, however, in this Part 3 let’s use pods that have more realistic usage of memory and cpu. Clouddriver-caching pod of spinnaker is used to tune the resources. Also pods with multiple containers and deployments with multiple replicas are tested.

Introduction - Leveraging Prometheus Metrics

Prometheus provides CPU and memory usage of a given container in a pod. It also provides functions like avg_over_time and max_over_time to get the average and maximum values of CPU and memory.

These values can then be used to tune the pod resources using spinnaker pipelines with webhook, run the job and patch the stages.

Prerequisites:

Spinnaker (needs to be installed)
Prometheus and AlertManager installed to monitor pods running in the cluster.

Details:

We have followed the same steps and made the same deployments as discussed in Part 1, but in addition to it, ‘clouddriver-caching’ pod of spinnaker is used.

Hence a run job stage using curl and jq was used to query prometheus and extract values using jq.

Example of the curl command is as below.

				
					curl -g "http://<url>/api/v1/query?query=<query> | jq

The following metrics were used to calculate resources specifications for the pod.

				
					memory request: container_memory_working_set_bytes{container=\"$container\"}["$period"])"
memory limit: max_over_time(container_memory_max_usage_bytes{container=\"$container\"}["$period"])"
cpu request: avg_over_time(rate(container_cpu_usage_seconds_total{container=\"$container\"}[1m])["$period":1m])"
cpu limit=: max_over_time(rate(container_cpu_usage_seconds_total{container=\"$container\"}[1m])["$period":1m])"

Do note that this was done for a period of 2 hours. A different value could’ve been used depending on the period for which data is available. Also the limits were factored using a value of 0.8, i.e the values of limits provided by the above queries were divided by this factor to get a conservative cushion.

Multi container pods and multi replica deployments were tested using spinnaker pipelines with a webhook stage to trigger the pipeline, run the job stage to query prometheus, to get memory and cpu request and limit, and finally a patch stage to patch the deployment.

Conclusion:

We’ve tested four different scenarios of high CPU, low CPU, high memory and low memory. And in all these cases, the pipelines were successful in adjusting memory and CPU requests and limits in response to the alerts.

This resulted in reduced wastage of computing resources (in case of low resource usage pods) and preventing pod crashes due to out of memory errors and pod throttling. Finally even when clouddriver-caching was used in real time, the pipelines managed to stabilize resource usage. This method worked even when there were multiple containers per pod or multiple replicas per deployment.

Thus it has been successfully proven that Spinnaker pipelines can be used to tune its own pods to appropriate compute resource usage.

Acknowledgements:

I thank Sharief Shaik and Srinivas Kambhampati for their inputs.

Tags : Kubernetes, Spinnaker

Gopal Jayanthi

Gopal Jayanthi has 15+ years of experience in the software field in development, configuration management, build/release, and DevOps areas. Worked at Cisco, AT&T (SBC), IBM in USA and Accenture, Bank of America, and Tech Mahindra in India. Expertise in Kubernetes, Docker, Jenkins, SDLC management, version control, change management, release management.

Link

0 Comments

Submit a Comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Secure Delivery

Intelligent Delivery

Case Studies

Spinnaker

Argo

DevOps Transformation

CD University

Self Tuning – Part 3. Using Prometheus metrics to patch the Kubernetes resource limits/requests

Gopal Jayanthi

Introduction - Leveraging Prometheus Metrics

Prerequisites:

Details:

Conclusion:

Acknowledgements:

Gopal Jayanthi

0 Comments

Submit a Comment Cancel reply

You May Like

Spinnaker Continuous Delivery for OpenStack Environments

Move from Armory Spinnaker to OpsMx Spinnaker in 2 Days

OpsMx adds Automated Canary Analysis for Kubernetes and Prometheus with Kayenta

Webinar

Recent Posts

Videos & Podcasts : How To Build Amazon AMI Image Using Spinnaker

Ship better software faster