Select Page
by

Shashank Srivastava

|
last updated on July 26, 2022
Share

About Symphony

Symphony is a unicorn that offers a highly secure and scalable cloud based messaging and collaboration platform for financial institutions of all sizes, with bots and automation to improve everyday workflows.

Challenge

Accelerate Innovation by Reducing Software Delivery Time

Faced with stiff competition, the organization needed to increase the speed of delivering enhancements to their end customers while simultaneously reducing production failures caused by defective updates. 

The primary bottleneck they faced was a lengthy manual approval process to move updates from staging to production.  However, shortening the approval process had been proven to increase problems in production. 

Their IT architecture is complex, aggravating the problem. They deploy a broad range of microservices-based applications on Kubernetes, as well as a large number of monolithic applications. Their CI/CD system was built using Jenkins, plugins, and custom scripts. 

Moving more quickly was a key goal. They could process only 50 to 100 updates per month, and their goal was many hundreds of updates per month. Of course, like all companies, they are also under pressure to reduce costs. They estimated verifying updates to have annual direct costs of more than $1M. 

They evaluate every significant deployment as they move to production. The analysis requires 3 expert engineers, including at least one technical lead and one product engineer, and it takes an hour or more to decide whether to move the deployment forward. 

The analysis process for every update was time consuming because of the mountains of data generated. With hundreds of thousands of concurrent users, there is a tremendous amount of metrics and logs created. Consistently finding the “needle in the haystack” shows a potential problem is complicated, even for experienced engineers. As the team’s frequency of updates increased, the severity of the problem grew until they were nearly at a breaking point. 

1. The analysis and decision process is a bottleneck

Gathering and filtering the metrics and logs was monotonous and time-consuming. The data analysis was slow and problematic because it depended on subtle differences that were hard to find. And many times, the final approval decision was hard because conflicting data would point to both promoting and rejecting the update.

The leader of the developer productivity team, who improved this situation, said it best. “Even though most updates should be moved to production, you can’t assume that everything works all the time. We strive for uniformly fast AND reliable releases.” 

2. Too many errors

Any impact to the availability and performance of the customer-facing applications has a direct impact on Symphony’s revenue and a large indirect impact on the image and reputation of the brand. Too many updates were being approved incorrectly. Worse still, errors that should have been caught – because they had occurred before – continued to be made. 

3. Requires Expensive Experts

In order to reduce the chance of an incorrect decision, the most senior engineers conducted most reviews and analyses. It was challenging to train inexperienced engineers because of the time pressure, the complexity and subtlety of the analysis, and the limited number of people who were qualified to train. 

Solution

OpsMx ISD brings in Intelligence for deploying with Jenkins

The best way to increase the speed and reliability of a process is to automate it using machine intelligence. The team had the vision to improve productivity: use ML to automate the verification and approval process in the deployment cycle. 

To enable this vision, the solution needed to be as accurate and consistent as a human team of experienced experts. Any errors – either rejecting an update that should be approved or accepting an update that was later determined to be faulty – would have large consequences, so any solution needed to perform better than the human experts.

"ISD provided us with a layer of intelligence that makes continuous delivery effective"
Ravi Varanasi
SVP Engineering

After a thorough evaluation of potential solutions, including trying to build the solution on their own, they worked with OpsMx and implemented the Data Intelligence module of OpsMx ISD Platform. ISD adds an intelligence layer for software delivery, integrating with any CI/CD platform. It uses AI/ML to automate verification (refer to the screenshot below) and approvals, provide continual governance, and create visibility and insights into operations and best practices. 

Here, ISD Data intelligence module gathers and evaluates logs stored in Elasticsearch and metrics from Datadog and others. Using natural language processing, statistical analysis, and machine learning algorithms, it analyzes every deployment and assigns a confidence score. The pipelines are configured to automatically promote updates to production when they are very likely to be successful, and reject them and return them for rework if the confidence score is too low.

Results

Faster and More Reliable CI/CD Pipelines

Since the deployment of OpsMx ISD, Symphony has seen significant improvements in software delivery velocity. Most production approvals now require zero time from an engineer; even decisions that need to be reviewed are completed more quickly because the data is gathered and initial analysis is completed automatically. The history of similar errors is automatically retrieved, along with the corrective action, speeding the resolution of issues. 

With ISD Data Intelligence Module, the number of updates has increased from 100 per month to over 1000, and errors in production have decreased as well.

The system has also improved the quality of the approval decision, both approving acceptable updates more quickly and rejecting more errors before they reach production. This improvement in accuracy is especially important in their most mission-critical applications – some applications run the OpsMx ISD’s verification and approval process over five times a day.

Overall, they have been able to increase the update velocity thanks in large part to reducing the approval cycle. They have moved from 100 updates per month to over 1000, enabling them to more quickly respond to their customers. 

The leader of the developer productivity teams says “OpsMx ISD has really helped us by automating the analysis of our deployments. It is very reliable in finding potential issues and has proven itself to be better than our experts at evaluating risk. Because it is automated, it is very consistent – we don’t worry that it will have a bad day and miss an issue.” 

Because the system continually learns, expert engineers can train OpsMx ISD Data Intelligence Module. This means that over time, it is able to dramatically reduce the time they spend analyzing updates. This allows them to work on higher value activities. 

“OpsMx ISD is more effective than our experts at evaluating updates.”  – Director of Developer Productivity

Overall, the new system has improved production reliability and has enabled faster development of new capabilities, adding the equivalent of over six full-time senior engineers to the team. 

The deployment of OpsMx ISD is now moving to its second phase: automatic policy checking. For example, to more easily meet SOX regulatory compliance, the person implementing any change can not approve moving the change into production. Similarly, a QA manager must approve all significant updates. These policies and many others can be validated before an update is considered for promotion to production, saving even more time. 

These policy checks will pay off in terms of faster releases and better compliance, which generates higher-quality releases. The productivity team leader concluded, “We’re glad to be partnering with OpsMx and believe that ISD Data Intelligence layer makes our continuous software delivery system effective.” 

Additional Reading

Shashank Srivastava

As a Country Manager, Sales & Marketing (ROW) at OpsMx, Shashank is responsible for revenue for Europe, Middle East and Asia Pacific. He is also responsible for Product Marketing and Strategic Partnerships. Shashank brings in over 20 years of experience in selling and marketing technology / software solutions. Over these years he has led teams for marketing, sales, business development and field operations. He has successfully driven several strategic initiatives within startup environments.

Link

0 Comments

Submit a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.