Share

Problem Introduction:

In Part 1, We have seen how to automate the process of increasing the limits/requests for the pods memory/cpu using the webhook alerts feature of alert manager and webhook triggered pipelines of spinnaker.

But what value should we patch? How to decide on the optimal values for cpu/memory as these values varies from application to application, we are going to address the same in the below solution.

Solution Introduction:

Vertical Pod Autoscaling provides recommendations for resource usage over time. We can configure vertical Pod Autoscaling to provide recommended values for CPU and memory requests and limits that we can use to manually update/patch for the Pods.

Prerequisites:

Spinnaker installed, prometheus and alertmanager installed to monitor pods in the cluster, VPA enabled with updatePolicy.updateMode set to 'Off'.

Details:

For Part 1, we have used a manifest file from the example present in the following link

https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/

We have converted the above manifest file into a deployment and now we are going to deploy a Vertical Pod Autoscaler (VPA) as well along with it. 

				
					apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mem-high
    for: feedback
  name: mem-too-high
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mem-high
  template:
    metadata:
      labels:
        app: mem-high
    spec:
      containers:
        - args:
            - '--vm'
            - '1'
            - '--vm-bytes'
            - 150M
            - '--vm-hang'
            - '1'
          command:
            - stress
          image: polinux/stress
          name: mem-high
          resources:
            limits:
              memory: 165Mi
            requests:
              memory: 100Mi
				
			
				
					apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: mem-too-high
spec:
  targetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: mem-too-high
  updatePolicy:
    updateMode: 'Off'
				
			

As you can see the above Deployment file creates a pod which continuously uses 150M of memory. Which is more than 90% of 165Mi of limits that was configured. Once this crosses the threshold, we have seen in Part 1 how a webhook alert is triggered by prometheus alert-manager and how it triggers the pipeline and patches the deployment file.

But what value to patch? For that before the patch stage we are going to check for the VPA recommendations and get the memory/cpu recommended value using Spinnaker SpEL Expressions and pass on the value to the patch stage. Please note that these recommendations provided by VPA will be applicable to this specific deployment. Also we will be running VPA in offline mode, this means we are using VPA only for recommendations but not allowing it to go and do the modifications by itself.

This is how the VPA object looks like, under status.recommendation we can see the values VPA came up for this deployment.

				
					kubectl get vpa mem-too-high -n default -o yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  annotations:
    artifact.spinnaker.io/location: default
    artifact.spinnaker.io/name: mem-too-high
    artifact.spinnaker.io/type: kubernetes/VerticalPodAutoscaler.autoscaling.k8s.io
    artifact.spinnaker.io/version: ""
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"autoscaling.k8s.io/v1","kind":"VerticalPodAutoscaler","metadata":{"annotations":{"artifact.spinnaker.io/location":"default","artifact.spinnaker.io/name":"mem-too-high","artifact.spinnaker.io/type":"kubernetes/VerticalPodAutoscaler.autoscaling.k8s.io","artifact.spinnaker.io/version":"","moniker.spinnaker.io/application":"feedback","moniker.spinnaker.io/cluster":"VerticalPodAutoscaler.autoscaling.k8s.io mem-too-high"},"labels":{"app.kubernetes.io/managed-by":"spinnaker","app.kubernetes.io/name":"feedback"},"name":"mem-too-high","namespace":"default"},"spec":{"targetRef":{"apiVersion":"extensions/v1beta1","kind":"Deployment","name":"mem-too-high"},"updatePolicy":{"updateMode":"Off"}}}
    moniker.spinnaker.io/application: feedback
    moniker.spinnaker.io/cluster: VerticalPodAutoscaler.autoscaling.k8s.io mem-too-high
  creationTimestamp: "2022-06-23T07:05:32Z"
  generation: 886
  labels:
    app.kubernetes.io/managed-by: spinnaker
    app.kubernetes.io/name: feedback
  name: mem-too-high
  namespace: default
  resourceVersion: "63129475"
  uid: 739278df-9bb7-490f-b54e-d04108e68f7e
spec:
  targetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: mem-too-high
  updatePolicy:
    updateMode: "Off"
status:
  conditions:
  - lastTransitionTime: "2022-06-27T12:32:29Z"
    message: Some containers have a small number of samples
    reason: mem-high
    status: "True"
    type: LowConfidence
  - lastTransitionTime: "2022-06-27T12:32:29Z"
    status: "True"
    type: RecommendationProvided
  recommendation:
    containerRecommendations:
    - containerName: mem-high
      lowerBound:
        cpu: 70m
        memory: "119537664"
      target:
        cpu: 125m
        memory: "183500800"
      uncappedTarget:
        cpu: 125m
        memory: "183500800"
      upperBound:
        cpu: 30210m
        memory: "44207964160"
				
			

We have taken the recommendation.target value to patch for the deployment file. Once the patching is done we can see that the previous memory limit 165Mi has changed to a value which was recommended by VPA.

				
					spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: mem-high
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        artifact.spinnaker.io/location: default
        artifact.spinnaker.io/name: mem-too-high
        artifact.spinnaker.io/type: kubernetes/deployment
        artifact.spinnaker.io/version: ""
        moniker.spinnaker.io/application: feedback
        moniker.spinnaker.io/cluster: deployment mem-too-high
      creationTimestamp: null
      labels:
        app: mem-high
        app.kubernetes.io/managed-by: spinnaker
        app.kubernetes.io/name: feedback
    spec:
      containers:
      - args:
        - --vm
        - "1"
        - --vm-bytes
        - 150M
        - --vm-hang
        - "1"
        command:
        - stress
        image: polinux/stress
        imagePullPolicy: Always
        name: mem-high
        resources:
          limits:
            memory: "183500800"
          requests:
            memory: 100Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
				
			

Due to Kubernetes limitations using VPA the only way to modify the resource requests of a running Pod is to recreate the Pod. If you create a VerticalPodAutoscaler object with an updateMode of Auto, the VerticalPodAutoscaler evicts a Pod if it needs to change the Pod’s resource requests.

https://cloud.google.com/kubernetes-engine/docs/concepts/verticalpodautoscaler

Conclusion:

If a pod gets Evicted, there will be a service disruption before a new pod scales up. Using this method we are patching the pod before the pods goes into eviction state making sure the application is available at all the times.

Acknowledgements:

I thank Gopal Jayanthi Vithal and Srinivas Kambhampati for their inputs.

Sharief Shaik

Sharief is a DevOps Engineer at OpsMx. Has broad experience in Cloud, DevOps, Kubernetes, Helm, GitHub Actions, GitOps, Argo, Spinnaker, Prometheus, and Datadog. He helps the team in supporting, automating, and leveraging CI/CD and DevOps processes. He is a shutterbug and loves to click pictures in his spare time.

0 Comments

Submit a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.