ArgoCD Rollout Just Got So Much More Smarter with OpsMx Autopilot

Autopilot complements Argo CD

GitOps has become a popular operational framework for applications and infrastructure automation.  In GitOps model application and infrastructure intended state in a declarative manner which is then automatically achieved.  All of the definitions can be stored as code in a Git repository and version controlled and audited.  GitOps allows for the ability to roll back to an intended state quickly and in a predictable manner. 

The Argo project is one of the most popular GitOps tools for declarative continuous delivery for Kubernetes. Argo is a fully open-source project and was originally created by Intuit.   Argo projects consist of the following major sub-projects

  • Argo Workflow – Kubernetes-native workflow engine supporting step-based workflows
  • Argo CD – Declarative GitOps Continuous Delivery tool for Kubernetes
  • Argo Rollouts – Simple Progress Delivery tool (including Blue-Green and Canary deployment strategies) for Kubernetes
  • Argo Events – Event-based dependency management for Kubernetes

In this blog, we will primarily focus on Argo Rollouts and how integration with OpsMx Autopilot enables more safer and reliable deployments with machine learning-based automated verification.

How does Argo Rollout work?

Argo Rollout is a Kubernetes workload resource that is equivalent to a Kubernetes Deployment object. It is intended for advanced deployment or progressive delivery functionality. Argo Rollout provides deployment strategies like blue-green deployments and canary deployments. 

First, it integrates with ingress controllers and service meshes for advanced traffic routing to gradually shift traffic to the new version during an update. 

Second, for blue-green & canary analysis,  it can query and interpret metrics from various providers to verify key KPIs and drive automated promotion or rollback during an update.

Argo Rollout performing canary and Blue-green

Figure 1:  Example of Argo Rollouts performing Blue-Green and Canary Deployments

 

Argo Rollout performing progressive delivery

Figure 2: Example of Argo Rollout – Progressive Canary Delivery

 

Argo CD Rollout and Analysis Template

Figure 3: Example of Rollout and Analysis Template citing how Argo Rollout calls Prometheus services for the canary.

Argo CD uses monitoring tools such as Prometheus, DataDog, NewRelic, and Wavefront for verification metrics. For metric analysis, Argo uses an open-source service called Kayenta. 

However, current architecture may not be sufficient as Argo Rollout cannot identify quality regression. 

Second, the status quo does not help you triage the risk of the release process and resolve issues in production. 

Third, there is no provision to check for abnormalities in production after deployment.

How Autopilot Adds Deploy Intelligence to Argo Rollouts

OpsMx Autopilot provides the intelligent layer for CI/CD pipeline. Autopilot uses AI/ML to analyze logs and metrics and other data sources to identify the risk of all changes, automatically determining the confidence that an update can be promoted to the next pipeline stage without introducing errors. Autopilot also automates policy compliance, ensuring that all your governance rules and best practices are followed. Autopilot reduces errors in production, increases release velocity, and improves security, quality, and compliance.

After analyzing a release, Autopilot can use Argo Rollout to either abort and progress the release ( refer to the images below).. Apart from the automated canary analysis, the best part is Autopilot can fetch logs and metrics from tools like Splunk, Sumo Logic, Appdynamics, etc., and highlight the risk of a release in production. If there are any problems such as latency issues or SQL connection issues, etc. in an application in the production, then it can be quickly rolled back.

OpsMx Autopilot and Argo Rollout Architecture

Figure 4: OpsMx Autopilot Integration with Argo Rollouts Architecture

The primary benefit of integrating Autopilot with your Argo Rollout is, Autopilot helps you catch errors as soon as possible, before customers notice, and make a quick transition back to the older version. 

Secondly, it helps SRE by providing visibility and insight into the most probable cause of release errors and resolving them.  

Identify and Diagnose Risks in Application Logs with Autopilot

Figure 5:  Natural Language Processing  (NLP) and Machine-Learning to Identify and Diagnose Risks in Application Logs

Figure 6:  Machine-learning based Automated Metric Analysis and Risk identification

Autopilot supports many APM and log analyzers, including ElasticSearch, Sumologic, Splunk, Stackdriver, Appdynamics, Datadog, Prometheus, New Relic, and Graphite.

How to Integrate Autopilot and Argo Rollout in less than 5 minutes

It takes less than 5 minutes to integrate Autopilot with your Argo Rollouts through a custom Kubernetes job. Below is the Experiment and AnalysisTemplate of the integration. 

apiVersion: argoproj.io/v1alpha1

kind: Experiment

metadata:

  name: opsmx-experiment

spec:

  # Duration of the experiment, beginning from when all ReplicaSets became healthy (optional)

  # If omitted, will run indefinitely until terminated, or until all analyses which were marked

  # `requiredForCompletion` have completed.

  duration: 6m

 

  # Deadline in seconds in which a ReplicaSet should make progress towards becoming available.

  # If exceeded, the Experiment will fail.

  progressDeadlineSeconds: 120

 

  # List of pod template specs to run in the experiment as ReplicaSets

  templates:

    - name: baseline

      # Number of replicas to run (optional). If omitted, will run a single replica

      replicas: 1

      selector:

        matchLabels:

          app: samplebaseline

      template:

        metadata:

          labels:

            app: samplebaseline

        spec:

          containers:

            - name: rollouts-baseline

              image: opsmxdev/issuegen:gradle-issugen-22

              imagePullPolicy: Always

              ports:

                - name: http

                  containerPort: 8080

                  protocol: TCP

    - name: canary

      replicas: 1

      minReadySeconds: 10

      selector:

        matchLabels:

          app: samplecanary

      template:

        metadata:

          labels:

            app: samplecanary

        spec:

          containers:

            - name: rollouts-canary

              image: opsmxdev/issuegen:gradle-issugen-23

              imagePullPolicy: Always

              ports:

                - name: http

                  containerPort: 8080

                  protocol: TCP

 

  # List of AnalysisTemplate references to perform during the experiment

  analyses:

    - name: verify-job

      templateName: verify-job

      args:

        - name: duration

          value: 360s

        - name: start-time

          value: "{{experiment.availableAt}}"

        - name: end-time

          value: "{{experiment.finishedAt}}"

 

kind: AnalysisTemplate

apiVersion: argoproj.io/v1alpha1

metadata:

  name: verify-job

spec:

  args:

    - name: duration

      value: 180s

    - name: exit-code

      value: "0"

    - name: start-time

    - name: end-time

  metrics:

    - name: verify-job

      count: 1

      provider:

        job:

          spec:

            template:

              spec:

                containers:

                  - name: verify-job

                    image: opsmx11/verify:v3

                restartPolicy: Never

            backoffLimit: 0

Benefits of Autopilot with Argo Rollouts

  • Automated reliable Risk Assessment during progressive delivery using Argo Rollouts minimizing product failures
  • Automated decisions for faster roll-back or roll-forward
  • Automated advanced diagnostics and triage of risks found to reduce the most time-consuming portion of remediation.
  • Machine-learning based analysis without depending on an ad-hoc or primitive threshold-based metric only analysis
  • Integration with vast data sources of logs (Splunk, Elastic, SumoLogic, etc.) and metrics (Prometheus, Datadog, Stackdriver, etc.)

Summary:

Using Autopilot with Argo Rollouts, Enterprises can prevent errors in production, increase release velocity, and improve security, quality, and compliance through the intelligence layer. 

For a demo of OpsMx Autopilot integration with Argo Rollout and Argo Projects in general,  please contact us

Leave a Comment

Your email address will not be published.

You may like