This blog will describe the power of combining Splunk and Autopilot. Autopilot is a powerful add-on to any CI/CD process running on any platform. We’ll specifically cover how OpsMx uses Splunk and Autopilot together to speed up a CI/CD pipeline.
Most software organizations run CI/CD pipelines to move updates from dev to test to staging to prod. This CI/CD process is an intricate set of interconnected steps that deploy software to the target platforms. During the testing process, especially the performance, scalability, and resiliency tests, each application typically generates a large amount of data through log entries. On a human level, this data is complex, unstructured, and impossible to analyze. But a business can gain a tremendous advantage if this data can be properly exploited.
OpsMx Autopilot uses machine learning (ML) and natural language processing to analyze the data for you automatically so you can quickly and accurately decide whether any update should be moved forward in the pipeline. This blog will cover how Autopilot can extend the value of Splunk in your continuous delivery initiatives..
Why do you need Autopilot?
The need for faster and more resilient CI/CD pipelines is a top priority in software organizations. This allows them to stay ahead of the competition. Splunk has helped a great deal to speed up software delivery, but teams need to go still faster. Autopilot extends Splunk in key ways and is designed specifically for the CD process.
Autopilot, which comes with Natural Language Processing and ML algorithms, speeds the CD process. Autopilot understands application and system logs, performs a risk assessment, and can control the CI/CD pipeline through an approval or rejection decision. Not Satisfied? Read more.
Can my SREs keep up with the scale and speed of deployment?
When running a few pipelines, managing deployment risk manually is manageable. There may be only a few updates per month, and a competent team of SREs can tackle the risk assessment.
However, for many OpsMx customers, the scale is much different. There may be hundreds or thousands of updates every day. The pipelines may throw hundreds of exceptions every day, each one requiring manual analysis and judgement of whether to proceed. Manual analysis and judgment, even with the help of scripting are very tedious, time-consuming, and prone to errors. SREs are costly resources and we don’t want them to be occupied doing risk assessments all the time. Deploying an update that contains errors can be a very costly mistake.
Autopilot integrates with Splunk to automate risk assessment, saving a tremendous amount of time and improving the accuracy of promotion decisions. Autopilot can also automatically approve or reject deployments in your CI/CD Pipeline when the decision is clear – and you set the level of clarity needed. Autopilot simplifies and improves time-consuming and error-prone processes.
What is Autopilot, and where does it fit in?
With recent advancements, software delivery has moved from a monolithic architecture to container-based architecture where applications get upgraded every hour and minute. With such speeds in deployment, it becomes nearly impossible to monitor and analyze every report that is generated from every update delivery. Further action on the reports that get generated will only slow down the code to production duration.
This is where we bring in OpsMx Autopilot. Autopilot is an intelligence layer for any Continuous Delivery Platform and Log analyzer. This can be added to your Splunk platform to perform advanced data analytics and predictive modeling to fasten business decisions. With inbuilt features of verification gates, Autopilot can perform risk assessment scores on the quality of deployment. These assessments are done in a matter of seconds and provide an in-depth view into any arising problems in the update. Autopilot is not only integrated with the machine log analyzer tool but also can be integrated with Jenkins or any other CD platform giving it a 360 view of the overall process. With a loop-back mechanism, the risk assessment improves over time and reduces the need for an SRE to monitor and analyze every report that comes his or her way.
Do I need Splunk to Run Autopilot?
Autopilot seamlessly integrates with Splunk – the process is available in GUI and can be completed in minutes.
Once the integration is configured, Splunk performs a series of preliminary steps that are crucial to Autopilot and the release verification and approval process, including collecting, collating, and filtering the log entries to just those related to this update.
How Autopilot transcends your CI/CD Log Report analysis
Autopilot can be configured to fetch the logs of applications from Splunk and apply domain-specific ML models on the logs. Autopilot’s intelligent analysis is central to automated deployments in production. Based on the results of analysis, i.e. comparing risk scores of a new release against a baseline run, Autopilot determines if to promote a new update fully to production. The log analysis and risk- assessment are processed in a matter of seconds and provide automated decisions to a pipeline run.
The AI/ML-enabled intelligence layer in Autopilot uses supervised learning to improve its judgment abilities over time. SREs, as they evaluate the confidence score of any given release, can modify Autopilot’s assessment of the impact of errors and warnings. These inputs are like feedback to Autopilot, which helps it to develop a contextual understanding of specific applications and pipelines.
Continuous Risk Assessment Example
This below is an actual screenshot of Autopilot’s Risk Assessment Dashboard, where logs from the deployment updates were analyzed for their risk. As defined by the SRE, on a success score of 83, the Autopilot auto triggers the pipeline to move forward with deployment.
In case of a critical error when the risk assessment fails, Autopilot will automatically block the deployment pipeline from executing and inform the SRE to take proactive action on resolving the issue. All these happen in real time with Autopilot.
The exception cases where Autopilot experiences a new metric score or log never encountered before the SRE can perform due diligence on the case and assign the relevant action back to the Autopilot so that it is taken care of in the future.
Continuous Risk Verification with Autopilot
Autopilot brings in the feature of “Approval Gates,” which places control over your CI/CD pipeline at critical points. Autopilot collects and presents all data that is relevant to the approval decision, including the assessments of the confidence level. They can evaluate the information and decide. Alternatively, the SRE can automate this approval decision if the confidence score is above a configurable threshold This will free them from the repetitive task of analyzing updates that are clearly either failures or successes.
Observability of enterprise-wide Continuous verification
Autopilot provides an enterprise-wide historical analysis of risk scores for past deployments. For example, it provides application-centric time-series view of various risk scores along with their respective canary ids. It also provides a service-centric deep-down analysis of each risk. It provides several critical errors, warnings, and exceptions in logs across the chosen time period.
Real-time approvals with policy automation
With the increasing scale and speed of deployment, it is crucial that all the updates that move in a CICD pipeline go through policy checks in your organization. For e.g., we have gathered logs of a new release from Splunk after deploying into the staging environment. However, before moving into production, the CD pipeline can be configured to perform a runtime policy check like the Blackout window, or if the release has proper approvals from the right stakeholders. Autopilot can empower policy managers to define such policies and enforce them into the deployment pipeline through policy gates.
Learn more about the compliance and audit video here.
Risk Verification Audit Trail with Autopilot
Besides release verification, Autopilot provides detailed audit capabilities. These allow SREs and SecOps teams to view all related deployment activities. This reduces the time and cost of required audits and speeds troubleshooting of any deployment-related issue by showing the who, what, where, and when of deployment steps.
With Autopilot, your organization will be able to analyze the risk of a deployment based on the logs and metrics of data that is generated in the pre-production stages. This helps you proactively avert promoting any risky updates into the production environment, saving time, effort, and cost to the organization. The CI/CD pipeline is accelerated without compromising quality with the help of Autopilot.