Extend Splunk with Autopilot

Splunk autopilot

Introduction

This blog will describe the power of combining Splunk and Autopilot. Autopilot is a powerful add-on to any CI/CD process running on any platform. We’ll specifically cover how OpsMx uses Splunk and Autopilot together to accelerate a CI/CD pipeline.

Most software organizations run CI/CD pipelines to move updates from dev to test to staging to prod. This CI/CD process is an intricate set of interconnected steps that deploy software to the target platforms. During the testing process, especially the performance, scalability, and resiliency tests, each application typically generates a large amount of data through log entries.  On a human level, this data is complex, unstructured, and impossible to analyze. But a business can gain a tremendous advantage if this data can be properly exploited.

OpsMx Autopilot uses machine learning (ML) and natural language processing to analyze the data for you automatically so you can quickly and accurately decide whether any update should be moved forward in the pipeline. This blog will cover the ways that Autopilot can extend the value of Splunk in your continuous delivery initiatives..

Why do you need Autopilot?

The need for faster and more resilient CI/CD pipelines is a top priority in software organizations. This allows them to stay ahead of the competition. Splunk has helped a great deal to accelerate software delivery, but teams need to go still faster.  Autopilot extends Splunk in key ways and is designed specifically for the CD process. 

Autopilot, which comes with Natural Language Processing and ML algorithms, is designed to speed the CD process. Autopilot understands application and system logs, performs a risk assessment, and can control the CI/CD pipeline through an approval or rejection decision.. Not Satisfied? Read more.

Can my SREs keep up with the scale and speed of deployment?

When running a small number of pipelines, managing deployment risk manually is manageable. There may be only a small number of updates per month, and a competent team of SREs can tackle the risk assessment. 

However, for many OpsMx customers, the scale is much different. There may be hundreds or thousands of updates every day. The pipelines may throw hundreds of exceptions every day, each one requiring manual analysis and judgement of whether to proceed. Manual analysis and judgment, even with the help of scripting is very tedious, time-consuming, and prone to errors. SREs are costly resources and we don’t want them to be occupied doing risk assessments all the time. At the same time, deploying an update that contains errors can be a very costly mistake. 

Autopilot integrates with Splunk to automate risk assessment, saving a tremendous amount of time and improving the accuracy of promotion decisions. Autopilot can also automatically approve or reject deployments in your CI/CD Pipeline when the decision is clear – and you set the level of clarity needed. Autopilot simplifies and improves time-consuming and error-prone processes.

What is Autopilot and where does it fit in?

With recent advancements, software delivery has moved from a monolithic architecture to container-based architecture where applications get upgraded every hour and minute. With such speeds in deployment, it becomes nearly impossible to monitor and analyze every report that is generated from every update delivery. Further action on the reports that get generated will only slow down the code to production duration. 

This is where we bring in OpsMx Autopilot. Autopilot is an intelligence layer for any Continuous Delivery Platform and Log analyzer. This can be added to your Splunk platform to perform advanced data analytics and predictive modeling to fasten business decisions. With inbuilt features of verification gates, Autopilot can perform Risk assessment scores on the quality of deployment. These assessments are done in a matter of seconds and provide an in-depth view into any arising problems in the update. Autopilot is not only integrated with the machine log analyzer tool but also can be integrated with Jenkins or any other CD platform giving it a 360 view of the overall process.  With a loop-back mechanism, the risk assessment improves over time and reduces the need for an SRE to monitor and analyze every report that comes his or her way.

Do I need Splunk to Run Autopilot?

Autopilot seamlessly integrates with Splunk – the process is available in GUI and can be completed in minutes.

Autopilot Integration
Autopilot Integration

Once the integration is configured, Splunk performs a series of preliminary steps that are crucial to Autopilot and the release verification and approval process, including collecting, collating, and filtering the log entries to just those related to this update.

Autopilot and Splunk combination
Autopilot and Splunk combination

How Autopilot transcends your CI/CD Log Report analysis

Autopilot can be configured to fetch the logs of applications from Splunk and apply domain-specific ML models on the logs. Autopilot’s intelligent analysis is central to automated canary deployment in production. Based on the results of analysis i.e comparing risk scores of a new release against a baseline run, Autopilot determines if to promote a new update fully to production. The log analysis and risk- assessment are processed in a matter of seconds and provide automated decisions to a pipeline run.

The AI/ML-enabled intelligence layer in Autopilot uses supervised learning to improve its judgment abilities over time. SREs, as they evaluate the confidence score of any given release, can modify Autopilot’s assessment of the impact of errors and warnings. These inputs are like feedback to Autopilot, which helps it to develop a contextual understanding of specific applications and pipelines.

Risk Assessment Example

This below is an actual screenshot of Autopilot’s Risk Assessment Dashboard where logs from the deployment updates were analyzed for their risk. As defined by the SRE, on a success score of 83, the Autopilot auto triggers the pipeline to move forward with deployment.

AI Driven Risk Assessment
AI-driven Risk Assessment

In case of a critical error when the risk assessment fails Autopilot will automatically block the deployment pipeline from executing and inform the SRE to take proactive action on resolving the issue. All these happen in real-time with Autopilot.

Risk Assessment Dashboard

The exception cases where Autopilot experiences a new metric score or log never encountered before the SRE can perform due diligence on the case and assign the relevant action item back to the Autopilot so that it is taken care of in the future.

Continuous Risk Verification with Autopilot

Autopilot brings in the feature of “Approval Gates,” which places control over your CI/CD pipeline at critical points. Autopilot collects and presents all data that is relevant to the approval decision, including the assessments of the confidence level.  They can evaluate the information and make a decision. Alternatively, the SRE can automate this approval decision if the confidence score is above a configurable threshold This will free them from the repetitive task of analyzing updates that are clearly either failures or successes.

Risk Verification
Risk Verification

Observability of enterprise-wide software verification

Autopilot provides an enterprise-wide historical analysis of risk scores for past deployments. For example, it provides application centric time-series view of various risk scores along with their respective canary ids. Additionally, it also provides a service-centric deep-down analysis of each risk. It provides a number of critical errors, warnings, and exceptions in logs across the chosen time period.

Visibility Dashboard
Visibility Dashboard

Real-time approvals with policy automation

With the increasing scale and speed of deployment, it is crucial that all the updates that move in a CICD pipeline go through policy checks in your organization. For e.g, we have gathered logs of a new release from Splunk after deploying into the staging environment. However, before moving into production, the CD pipeline can be configured to perform a runtime policy check like the Blackout window, or if the release has proper approvals from the right stakeholders. Autopilot can empower policy managers to define such policies and enforce them into the deployment pipeline through policy gates.

Policy Management
Policy Management

Learn more about the compliance and audit video here.

Risk Verification Audit Trail with Autopilot

Audit Trail
Audit Trail

In addition to release verification, Autopilot provides detailed audit capabilities. These allow SREs and SecOps teams to view all related Deployment activities. This reduces the time and cost of required audits and speeds troubleshooting of any deployment-related issue by showing the who, what, where, and when of deployment steps.

Conclusion

With Autopilot, your organization will be able to analyze the risk of a deployment based on the logs and metrics of data that is generated in the pre-production stages. This helps you proactively avert promoting any risky updates into the production environment, saving time, effort, and cost to the organization. The CI/CD pipeline is accelerated without compromising quality with the help of Autopilot.


OpsMx is a leading provider of Continuous Delivery solutions that help enterprises safely deliver software at scale and without any human intervention. We help engineering teams take the risk and manual effort out of releasing innovations at the speed of modern business. For additional information, contact us.

Leave a Comment

Your email address will not be published.

You may like