Three ways to tackle operations complexity by applying modern DevOps processes to hybrid environments

4 min readNov 18, 2020

[Initially published at DevOps.com, read here the full version]

At Digibee, we developed a platform that connects applications and systems which helps companies accelerate their digital transformation. Due to our customers’ complexity we had to deliver the platform both as SaaS and managed environments, running on top three cloud providers and even on-prem.

We had all the ingredients to fall into an operations nightmare if done the traditional way. We chose a cloud native approach from day 1 so our architecture could overcome such challenges.

In this article, I show how to explore the three ways to tackle the aforementioned operations complexity.

Cloud Native Architecture — Kubernetes
Deployment as code — GitOps
Normalize hybrid environments — Run anywhere

Cloud Native Architecture

The first way is related to how our platform was architected from day 1. We chose Kubernetes to be the underlying architecture component to support our core microservices and every integration that is built with the Digibee platform.

Why we use Kubernetes?

This technology is an open-source system for automating deployment, deal with massive scale, and management of containerized applications. As our main assumption was to isolate every single integration that runs on top of our platform, we had to select an architecture component that could give us some head start.

A well designed application based on microservices and running on containers can benefit heavily from a technology such as Kubernetes in many different ways:

Resilience & Fault Tolerance: These are key ingredients for today’s highly available, always-on systems. With the help of Liveness and Readiness probes we set up an environment where any misbehaving container would be automatically recycled and have a new instance ready to use.
Multizone Cluster: With a multizone cluster it is possible to apply hard-to-achieve fault tolerant architectures where a container running on a zone is automatically moved to another zone if there is a failure. We coupled that with GCP’s ability to replicate volumes across zones and we delivered a true fault tolerant architecture.
Millisecond recovery: Any restarted container takes milliseconds to get ready again, so absolutely no interruption occurs.
Microservice Isolation: Misbehaving containers won’t affect others as there are reinforced limits in terms of CPU and Memory usage.

Deployment as code

GitOps is our mantra. GitOps is a way of implementing Continuous Deployment for cloud native applications. It focuses on a developer-centric experience when operating infrastructure, by using tools developers are already familiar with, including Git and Continuous Deployment tools.

The technique here is to just follow regular Git Flow methodology and have automated “operators” apply changes to a Kubernetes cluster. These operators are regular containers running on a Kubernetes cluster which constantly monitor Git source repositories for changes. Once a change is detected, the operators automatically trigger an update.

For the automated installations, we use Helm as our deployment engine. As we commit changes to the Helm configuration in Git the operators automatically apply those changes.

This approach essentially reverts the classical imperative deployment model where actions are performed in the environment to a declarative approach where the state of the environment is defined by a set of rules and kept in sync by these operators.

We also achieved outstanding levels of governance in all our clusters, from a single source of truth. If a manual change was introduced in any of the environments, the operator would detect and revert back to the defined state.

Normalize hybrid environments

Kubernetes is powerful but comes at its price. It is very hard to maintain different versions and distributions, especially when using different cloud providers and on-prem environments.

Due to that fact, we had to find a technology that would abstract that complexity away, normalizing the operation.

Our choice was Google Anthos, which is a full Kubernetes distribution based on Google Kubernetes Engine that is available to Google Cloud Platform customers. Over the years, Google refined their Kubernetes distribution and eventually released as a product in their cloud platform. It is a very well designed and solid platform.

Due to the experience we had running our SaaS platform in GKE, we decided to select Anthos as the basis for all our implementations, SaaS, dedicated SaaS or on-prem.

These are the main advantages of using a technology such as Anthos:

Standard Kubernetes installation for on-prem, VMware-based environments
Connectors for Kubernetes Engines in AWS and Azure cloud providers
Built-in Configuration Management capabilities allowing us to declare our clusters and keep them in-sync
Remote controlled Kubernetes environments from a single admin portal so all Cluster resources are easily accessible
Monitoring metrics consolidated on Google’s StackDriver

Conclusion

The three ways are more than recipes to achieve greater scalability, governance and control of hybrid environments. They are actually a new mindset for application’s design and operations. We truly believe that they can help companies deliver better software while recognizing the complexity and challenges of dealing with enterprise applications.

Three ways to tackle operations complexity by applying modern DevOps processes to hybrid environments

Cloud Native Architecture

Deployment as code

Normalize hybrid environments

Conclusion

Written by Peter Kreslins