Automate Application Reliability Assessment with Chaos Monkey-OpsMx Blog

Share

Public cloud adoption has become the norm in Enterprises with the recent increasing trend of creating container based cloud-native applications. Given this IT infrastructure disruption, the developers need to ensure that their applications are reliable during any unplanned outages in the cloud infrastructure. In this blog, we will talk about how to set up and analyze application reliability using Chaos Monkey and Spinnaker platform.

Reliability Testing of Applications

Chaos Monkey tool built by Netflix OSS team is most associated with creating random disruption to your application to help you test the reliability of your services. Chaos Monkey is an example of a tool that follows the Principles of Chaos Engineering.

Chaos Monkey is fully integrated with Spinnaker, the continuous delivery platform that is being increasingly used by Enterprises like Intuit, Target, Waze, etc. Chaos Monkey works with any backend that Spinnaker supports (AWS, GCP, Azure, Kubernetes, etc.).

Enabling Chaos Monkey in Spinnaker

To enable Chaos Monkey in Spinnaker, issue the following hal command. (If you need help setting up Chaos Monkey itself, check this documentation)

Ubuntu# hal config features edit --chaos true

Enable Chaos Monkey for an Application

Once Chaos Monkey is enabled for the Spinnaker instance, Chaos Monkey is enabled for all new applications by default. But you can enable/disable Chaos Monkey for any application in the application configuration page by clicking on the Chaos Monkey radio button.

Enabling Chaos Monkey for New Application

The following figure shows the detailed Chaos Monkey configuration for an application.

Configuring Chaos Monkey for an Application

Termination Frequency allows the application owner to specify the frequency of instance terminations (which are scheduled randomly). Currently, it is not possible to schedule more than one termination per day per grouping. Choose a narrower group (cluster) to test reliability if necessary to test multiple terminations across the applications services.

Grouping configuration allows the owner to select the grouping to be used to terminate an instance for the application.

Application grouping selection results in an instance termination (max 1 per day) for the entire application (including all pipelines and stacks) per region (if “Region are Independent” is checked).
Stack grouping selection results in an instance termination (max 1 per day) for each stack (stack refer to vertical stacks of dependent services for integration testing that can be assigned during a cluster creation) per region.
Cluster grouping selection results in most terminations as Chaos Monkey terminates an instance for each cluster configured for all the pipelines of the application.

It is also possible to configure exceptions to exclude Chaos Monkey from terminating instances for specific regions or stacks for business reasons.

For information about configuration options of Chaos Monkey, check out the documentation.

Check the scheduled terminations by Chaos Monkey as below.

ubuntu:/apps/chaosmonkey$ 
      cat /etc/cron.d/chaosmonkey-daily-terminations

2 17 7 12 4 root /apps/chaosmonkey/chaosmonkey-terminate.sh 
   openshiftapp my-k8s-chaos-account 
    --cluster=openshiftapp-chaos-demo --region=default
42 21 7 12 4 root /apps/chaosmonkey/chaosmonkey-terminate.sh 
  chaosmonkeyapp my-aws-account --cluster=chaosmonkeyapp 
  --region=us-west-2

Enabling Automatic Application Reliability Analysis

Enabling Analysis of Reliability of Application

Now that Chaos Monkey is enabled and application instances are getting terminated, it is essential to analyze and measure the reliability of your applications. You can analyze the failures in the pre-production and the production environments. Since the terminations are set to occur randomly, it is critical to automatically analyze the application during those times.

OpsMx Continuous Risk Assessment platform integrates into Spinnaker and Chaos Monkey to trigger instant automatic application risk assessment upon a Chaos Monkey event and provides a detailed evaluation of the application reliability and behavior every time.

If you are interested in piloting OpsMx solution for Chaos Monkey, please email us at info@opsmx.com to get started.

Tags : Chaos Monkey, Spinnaker

Vardhan NS

Vardhan is a technologist and a marketing professional, currently working as a Sr. PMM at OpsMx. His strength lies in understanding complex technologies, and explaining them in un-complicated ways. Vardhan is a passionate Product Marketer with a keen focus on Content, helping brands Position themselves uniquely with clear messaging and competitive differentiation. Outside of work, he is an athlete that is passionate about Football, Swimming and Surfing.

Link

0 Comments

Submit a Comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Secure Delivery

Intelligent Delivery

Case Studies

Spinnaker

Argo

DevOps Transformation

CD University

Automate Application Reliability Assessment with Chaos Monkey

Vardhan NS

Enabling Chaos Monkey in Spinnaker

Enable Chaos Monkey for an Application

Enabling Automatic Application Reliability Analysis

Vardhan NS

0 Comments

Submit a Comment Cancel reply

You May Like

Why Integrate Spinnaker with Terraform?

How Customers Improve CI/CD Velocity Using Autopilot

3 Dilemmas DevOps Managers Face When Scaling Continuous Delivery Pipelines

Recent Posts

Videos & Podcasts : How To Build Amazon AMI Image Using Spinnaker

Ship better software faster