In our previous blog on Achieving development velocity, we established how organizations can scale their development velocity without compromising on security.
What is a Software Release? and why measure the quality of release?
A software release is a new or an enhanced version of existing software. A Software release is aimed at improving security, features, and functionality. It is essential to measure release quality because the new release can make or break the existing system. And when releases are carried out on a larger scale, things can quickly go south in an end-end release management process.
Release quality is directly correlated to one of the maturity index parameters of DevOps practices in an organization. Measuring a software release quality for an application or service helps teams determine what change will cause degradation of the service or application. Subsequently, it will need to be troubleshooted and fixed.
Though the chances of failures should not be ignored when a platform that can verify release quality is implemented. Teams need to set up remediation practices in case of an unlikely event.
“Release quality: A Key Indicator of good Software Delivery and Operational Performance”
Per the annual state of DevOps Report, one of the most efficient ways to compare your DevOps maturity and overall software delivery performance is with Release Quality.
Google categorizes teams as low, high, and elite according to their DevOps practices. The latest findings reveal top performers where the failure rates are below 15% of the total releases.
The impact of not prioritizing Release Quality
Velocity isn’t everything. Quality is equally important. With pressure to improve release velocity, releases may become prone to errors. A successful product release involves several interdependencies, multiple teams, and complex technological challenges. This led stakeholders to create point solutions that addressed issues as and when they arrived. This has ended up with an interconnected mesh of multiple point products.
Product releases happening in silos are bound to fail, as it will never meet customer demand and not adhere to benchmarks. Release velocity should not come at the expense of quality and security.
Some key issues when not prioritising release quality:
- Bugs in the software
- Decreased end-user experience
- Inefficient and delayed software release cycles because of multiple rework
- Inflated resource costs both in terms of infrastructure and personnel
- Substandard releases
Let us understand some key issues that lead to a drop in release quality.
1. Not validating assumptions
Developers put too much effort into building software that becomes very complex and not efficient. This is often called Gold Plating. Teams need to continuously validate with customers what is working and what is not.
2. Prioritizing performance over quality
Application performance may differ from customer to customer because of varied reasons like location or type of connection. A release must always prioritize quality over performing the release.
3. Not taking a customer centric approach
User experience is key to a happy customer. Not building software that is customer centric will lead to loss of revenue, competitive disadvantage and a tarnished brand reputation. Teams need to avoid building something that is completely dissonant with what customers need.
Documentation sets a clear acceptance criteria that is critical for a release. Teams that ignore this process and rush to complete a release end up making errors and unnecessary changes, which may cause surprises at later stages, where errors are difficult to correct.
5. Not Battle Testing
Organizations need to invest in tools like OpsMx Intelligent Software Delivery platform that can recreate production issues inside the four walls of the testing environment. While introducing a new feature, the customer might misunderstand the intended functionality and use it in a completely different way.
6. Relying heavily on Scripting
Maintaining test scripts and executing them is a very cumbersome task. It is resource intensive and prone to errors, this only increases the cost of testing not the efficacy. Organisation must invest in platforms that help them automate the entire value chain, such as OpsMx ISD Platform.
How can organisations improve release quality?
We all face pressure to improve release quality and reduce costs. When you are deploying software at scale and speed, any inefficiencies in the process are instantly visible and increase the overall costs.
At OpsMx, we understand pain points of organizations that want to improve their release quality. We help them pinpoint key issues that plague them and prevent them from optimizing operational efficiency and release quality.
Deploying a brand new process and technology might be counterproductive sometimes. OpsMx ensures that your business vision and goals align with the transformation that organizations seek. Following are three main issues that an organization needs to address to improve release velocity and simultaneously improve operational efficiency..
- Demand for heavy infrastructure
- Lagging developer and engineer productivity
- Errors in production
Let us drill down on these three bottlenecks
1. Elimination of idling infrastructure (affecting throughput)
Infrastructure costs are a major chunk of operating expenses. A poor release process means badly managed infrastructure, which will inflate costs. Testing or staging systems when they are idle, waste money. Creating a system that automatically provisions and tears down infrastructure has saved OpsMx customers millions of dollars.
Intelligent Software Delivery (ISD) on-premise or SaaS
Teams must not manually craft servers for two reasons : (i) any manual process is error prone and (ii) manual steps are hard to track. Also, there may be small steps that don’t get documented at all. Without a complete audit process, troubleshooting is very difficult.
ISD ensures that your release process does not encounter a configuration drift as it evolves over time. Over time, configuration drift results in snowflake servers: servers become so unique in configuration that they are impossible to reproduce exactly. This is avoided by destroying your servers on purpose and it also helps in improving confidence of the release process.
Infrastructure as code (IAC) is the practice of describing your servers in source files that you check into version control and apply them automatically. OpsMx ISD integrates with any configuration management tools to achieve this. These tools let you declare, in code, what your server should look like. The tool then automatically applies the changes to your servers.
ISD offers four key advantages for your infrastructure:
- More reliable releases. Automating infrastructure reduces the room for errors.
- Disaster recovery becomes simpler because many parameters will never change, so engineers can narrow down their search.
- Auditability and traceability help exercise micromanagement of your pipeline.
- Destroy idle or unused servers, thus reducing operating costs and mitigating the risk of configuration drift.
2. Reduction of errors in production (affecting throughput)
Issue 1 :
Errors in production are a common occurrence. DevOps accounts for these issues to happen. But in situations where errors happen due a trivial reason such as “it works perfectly on my machine” efficiency takes a major hit. This happens because the developer’s machine generally has a different configuration compared to production machines.
Issue 2 :
A message of “scheduled downtime” implies that the organisation does not have a deployment strategy in place. Also, making your team work in the middle of night to make a release is just going to reduce morale.
Not having a disaster recovery plan. Risky deployment is a situation where everyone on the development team is not confident that the software will work smoothly. Is the code optimized enough to behave as expected? Can the developed product handles the load? Usually, developers don’t have the answer to these questions and push it in a quiet environment, waiting to see when it falls over.
Intelligent Software Delivery (ISD) + Autopilot
Teams must find a way to release software without worrying about errors in production. This not only allows developers to innovate but also makes them push releases faster as they are aware that things can be reverted to normal at the slight hint of a production error.
A simple configuration management module will be able to solve this age-old problem of :” it worked fine on my system “
OpsMx ISD allows organisations to release software without affecting end users. With pre-defined templates to incorporate deployment strategies, such as Canary or Blue-Green, OpsMx can enable disaster recovery in pipelines within a few minutes. ISD can also integrate with any configuration management platform to make the problem of “it worked fine in my system” a thing of the past.
3. Improving developer and Engineer Productivity ( affecting Stability)
One hidden but real issue is a decrease in productivity due to release processes. At most companies, senior engineers, SREs, and DevOps staff are currently called upon to perform tasks that can now be automated. Automating release processes improves productivity, that can be the equivalent of multiple additional engineers who spent their time reviewing logs and metrics manually.
Intelligent Software Delivery (ISD) + Autopilot
The ISD platform integrates the testing with development workflows. By automating everything in the release cycle increases the efficiency of the team, procedures, and technology. Human error is diminished and everyday operations become stress free.
ISD + Autopilot leverages automation and AI to facilitate teams to devote more time to strategic thinking and less time to managing daily errands. As a result, organizations can constantly deliver consistent services to end-users.
Additional Benefits :
- No more stressed out teams
- Releases are performed on-demand, even during peak hours.
- No more dependency on a critical; resource.
- Releases are confidently deployed
ISD helps organizations meet their availability targets (Reliability), Maximize feature velocity (Velocity), reduce and eliminate human toil through automation (Maintainability), and optimize engineering time and matching resources efficiently (Efficiency). Organizations must balance velocity with reliability and engage specialists such as SREs to design and run systems.
Release Quality improved for customers
Success story of a leading financial SaaS provider:
The system infrastructure and deployment processes for this organization were both extremely complex. Due to the complexity, development and test environments were infrequently de-provisioned. This means they sat idle, creating extra infrastructure cost. The overall complexity led to decreased productivity.
Automated provisioning and de-provisioning of infrastructure leading to a dramatic reduction in overall infrastructure spend and demand. Additionally, both developer and SRE teams saw an immediate increase in productivity. Developers are now able to independently manage pipeline execution, so they waited less and developed more. SREs can now focus on their specific tasks rather than supporting development.
Automated Provisioning and infrastructure management.
Success story of a a leading online destination website:
Customer satisfaction is the most important parameter for success for this organisation. Any production errors can lead to customer defection and revenue loss for the company. Senior engineers spent hours reviewing and analyzing logs and metrics generated during testing. The company wanted to reduce the error rate even further, without increasing staff costs.
OpsMx Autopilot automated the release verification process, nearly eliminating the time that senior engineers must manually invest. Autopilot adds a layer of intelligence to the CD solution, using machine learning to analyze large amounts of testing data. Autopilot uses both supervised and unsupervised learning. As time goes on, the software gets smarter and identifies more errors automatically.
>$1M per year of savings from reduced errors and improved productivity.
Success story of a multinational technology conglomerate:
Increased errors with increase in releases. Too many errors were getting discovered in production because the agreed-upon processes were not always followed.
Developers and Engineers are often forced to wait for infrastructure to be available due to the manual provisioning of test environments, or to wait for an SRE to complete a release process..
Both of these challenges – too many production errors and decreased productivity – have been resolved with OpsMx ISD. Through automation, the company has essentially eliminated errors in production due to release errors. Nearly 2000 developers use the automated system. Their productivity has soared.
Exponential increase in release quality.
Success story of a leading Cybersecurity provider:
The organisation needed quick and frequent relapses to keep pace for repelling security attacks. The release velocity is as fast as a thousand releases per day. Inevitably, a small percentage of the builds fail. Senior engineers were manually correcting the build issues which amounted from tens to hundreds a day. At the enterprise scale, this was not sustainable.
Autopilot works with any build system. The customer relied on Jenkins to which autopilot seamlessly integrated to automatically gather the relevant data and use machine learning to evaluate every build. It determined failures and identifies the probable root cause of the failure. The automated release verification was able to perform swift analysis on release errors and advised SRE on hard-to-diagnose cases.
Automated Release verification saved an equivalent of 5 senior engineers per team.