What is Site Reliability Engineering (SRE)?

Site Reliability Engineering happens when an organization looks at problems through the lens of a software problem. SRE is a software engineering approach where Site Reliability engineers use software as a tool to manage systems, solve issues, and automate repetitive/mundane tasks. 

The primary aim of implementing this engineering practice is to develop a reliable and scalable software delivery system. The concept of this engineering practice was originally used by Google and creator Ben Treynor Sloss.

What is the role of a Site Reliability Engineer?

    • A Site Reliability Engineer oversees how code is deployed, configured, and monitored. SREs ensure availability, latency, and change management of code. While code is being run on production, they ensure healthy servers and take care of emergency situations. 
    • To successfully perform activities, engineers need to have development and operations experience. This enables them to understand the value stream of the software development lifecycle. 
    • Site Reliability engineers need to enforce SLAs. SLAs help prioritize product updates. They achieve this through a system of SLIs and SLOs. 
    • An SLI is a defined measure of specific aspects of provided service levels. Key SLIs include request latency, availability, error rate, and system throughput. An SLO is based on the target value or range for a specified service level based on the SLI.
    • Per Google, best practice is that SREs spend a maximum of 50% of their time on operations, which should be monitored to ensure they don’t exceed 50%. The rest of the time should be spent on development tasks like creating new features, scaling the system, and implementing automation.
    • Excess operational work and poorly performing services can be redirected back to the dev team to run instead of the site reliability engineer spending too much time on the operations of an application or service. 
    • Automation is an important part of the site reliability engineer’s role. If they are dealing with a problem repeatedly, then they should automate a solution. This also helps ensure that operations work remains at half of their workload. 
    • Maintaining the balance between operations and development work is a key component of SRE. 
SRE roles

For large organizations such as Google and Netflix, the practice of SRE is indispensable. SRE exploits repetitive and manual activities and automates them, which gives developers more time to innovate.

Which tools are used in SRE?

SR Engineers’ foremost goal is to enable the automation of repetitive tasks. They also need to standardize processes across the software’s lifecycle.

    • Containers: disparate microservices architecture for cloud native apps. (e.g., Docker).
    • Kubernetes: container orchestration for managing multiple containers.
    • Cloud platforms: to simplify the use of container environment and provide scalability. (e.g., AWS, Azure).
    • Project planning & management tools: manage IT operations across distributed teams. (e.g., JIRA and Pivotal Tracker).
    • Source control tools: these help erase boundaries between developers and operators, allowing for seamless collaboration and release of application delivery. (e.g., Subversion and GitHub). 

What Is DevOps?

The term DevOps is an acronym for two terms: development and operations. It is meant to present a combined front for the development and operations team. 

In the past, development and operations teams passed blame to each other for failures. DevOps is a collaborative approach to tasks and responsibilities performed by an application development and operations teams. The goal of DevOps is to merge daily tasks involved in the development, quality control, deployment, and integration of software development into a single, continuous set of processes. Teams develop best practices and principles that make development cycles shorter and help teams continuously deliver high-quality software. 

Efficiency is achieved through DevOps by following an iterative software development process where repetitive and simple activities are automated and infrastructure can be managed using pieces of code. For DevOps’ proactive tools, organizations need to implement the right tools and automation. Orgs also need to adopt cultural changes such as building trust between developers and systems administrators and aligning technological projects to business requirements. DevOps can change the software delivery chain, services, job roles, IT tools and best practices.


What are the responsibilities of a DevOps Engineer?

The most important outcomes that DevOps strives to achieve in an organization are automating repetitive processes, continuously delivering high-quality systems and reacting quickly to feedback to consistently improve processes.


Even inside an agile environment, developers, system admins and developers can work in silos. Such a system is not fast or nimble enough to incorporate customer feedback and improve value to customers. DevOps engineers reduce that complexity by closing the gap between actions needed to change an application and the tasks that maintain its reliability. 

A DevOps engineer works with diverse teams and departments to create and implement software systems. Some key responsibilities of a DevOps engineer are:

    • Documenting specifications and features of the product that is expected and sharing it with stakeholders. 
    • Project planning to communicate operational requirements and development forecasts.
    • Performing timely performance assessments through gap analysis. Identifying alternative solutions to meet demands of both developers and operation teams. 
    • Performing routine application maintenance to ensure that the production environment runs smoothly. 
    • Understanding technological advancements and implementing process improvements and expansions when needed. 
    • Identifying problems or bottlenecks in everyday processes and procedures. They test code, processes, and deployments to identify ways to streamline and minimize errors.
    • Working with numerous people across different teams. Strong verbal and written communication skills are very important.

Read the CICD Blog to discover which CICD tools you can leverage.


What are the similarities and differences between DevOps and SRE?



SRE is more operationally-driven from the top-down, and it’s governed by the developer or development team, instead of the operations team. 

DevOps aims to bridge the gap between development and operations by culturally aligning their tasks, objectives, and initiatives.

SRE is more pragmatic

DevOps adoption is more of a cultural and philosophical shift.

Developers have more control over the software monitoring and maintenance processes

The operations team with DevOps Engineers manages software monitoring and Maintenance.

Developers have more control over the software monitoring and maintenance processes

The operations team with DevOps Engineers manages software monitoring and Maintenance.

SRE encompasses all the DevOps factors but delves deeper into each one to add certain criteria to ensure a more detailed recipe for success.

DevOps defines these factors as breaking down silos or walls between groups within the organization to encourage more efficient collaboration, blameless failures, automation, monitoring, and observability.


While DevOps and SRE sound like they are on opposite sides of the spectrum, both approaches share the same end goals. Let us see how SRE supports DevOps.

    • Make incremental changes fast and efficient
    • Reduce the number of organization silos
    • Have a flexible, open-minded, and adaptable work culture
    • Use automation wherever possible
    • Monitor performance and improve when necessary

Elimination of silos: DevOps is all about collaboration and ensuring accountability. SRE can achieve this by using software platforms to enforce the same.

Implementing feedback: Change is considered the most difficult thing in an organization. DevOps encourages constant feedback and gradual change to improve software and delivery. SRE can make this happen by enabling teams to perform delta changes by implementing CICD platforms.

Accept failures: We learn only through failures. DevOps is all about moving fast and breaking things, but DevOps also preaches that broken code is a shared responsibility. SRE ensures that failures are controlled and accounted for. It also ensures that things can quickly move to Plan B and can be reverted to normal as soon as possible.

Shift left: DevOps enforces developers to write containerized code and use API for smooth communication between tools and service. SRE ensures that this approach is put to practice with automation tools and technologies. 

Monitoring: With automation, every step needs to be monitored. Every log and metric needs to be verified. SRE ensures that these metrics and logs are well under defined ranges to ensure reliability of the end product.

Tags : DevOps, OpsMx, sre

Jyoti Sahoo

Jyoti is a product marketer and an educator. What sets him apart as a PM is that he can create delightful, rich and engaging content for business leaders and technology experts. Previously he has delivered projects in Artificial Intelligence for Governments in the EU and clients in ANZ regions. On the sidelines he is a 3D printing enthusiast and a solar energy advocate.


Submit a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.