Team Insights
Secret management - or the ability to consistently store, manage, and access user, application, and infrastructure level secrets e.g: credentials, tokens, API keys etc in dynamic environments - is critical to the success of any platform. It also enables better handling of microservices-based architecture. With CI/CD cycles becoming shorter, maintaining that ability to develop, test, and deploy microservices is critical. Better secrets management allows the entire cloud infrastructure to remain flexible and scalable without sacrificing security in the process.
But, secret management in a multi-cloud platform like Archera can be a challenging task. We must also navigate the added complexity of secret management across dev, staging, and production environments. For example our Tilt Integration which is used for spawning local development environments requires development secrets in contrast to our CI/CD pipelines requiring staging secrets and our main app which requires production secrets.
Another important aspect when it comes to shortlisting a secret management solution for us is the ability to deploy or use an existing on-prem secrets manager to enable smoother implementation for many customer use cases. We also wanted a platform which can be deployed not just on different cloud platforms but also across multiple tenants and regions is imperative to the success of enterprise grade platforms. Another important factor for many organizations is support for Kubernetes deployments to take advantage of existing infrastructure hence scaling.
Secret Management is one of the central components for building any scalable platform. The following are some key areas we evaluated different approaches to secret management across when we looking for a solution to integrate into our platform
When it comes to secret management, there are a plethora of tools to choose from. We will restrict our discussion to comparing AWS Secret Manager, GCP Secret Manager, Azure Key Vault (which provide more or less similar functionality despite being from different cloud providers) vs HashiCorp's Vault and Consul.
All tools in the first category are great solutions and well integrated with other services in their respective ecosystem. The ability to manage user permissions, automatic key rotations, and API/library access are common across all these tools. Also, all these tools are built upon highly scalable platforms. The biggest downside is that all of these tools are cloud provider specific. Developers will have limited understanding of these tools’ storage backend architecture hence limited ability to configure based on their use case. But this customizability may not be a requirement or need for many organizations.
The second category are open source tools that are highly configurable and can work with on-prem deployment. Vault which is a secret management solution and Consul is a popular storage backend which is fault-tolerant and highly scalable. Both Vault and Consul are offered by Hashicorp. Together, they are a promising solution for teams running and managing other services on kubernetes because it can be deployed on a Kubernetes cluster as a helm chart.t. Though there are many more features, here are some we wanted to highlight:
Since we have most of our services running on Kubernetes, therefore it was an obvious choice when it came to medium for deployment. We used Vault and Consul helm charts for this purpose. Here is a reference architecture for Archera’s Vault/Consul deployment describing high level setup and interaction with our internal and external services via ingress controller:
Our initial deployment consisted of out of the box configuration with a file storage backend. Although it's okay for experimentation, it's not well suited for production grade deployments. Eventually we started using Consul. How it works - the client talks to the Vault server through HTTPS, the Vault server processes the requests, and then forwards it to the Consul agent on a loopback address. The Consul client agents serve as an interface to the Consul server. They are very lightweight and maintain very little state of their own. The Consul server stores the secrets encrypted at rest.
This setup laid the groundwork for further adding ingress for load balancing the L7 traffic which in turn is used by various environments such as development/production/staging, CI/CD systems and Argo workflow pipelines. The secrets can now be segmented across different environments and systems and enables us to set specific permissions. This also enables central management with minimal disruption to the application.
We used the above reference architecture (image from Hashicorp) with a three-node Vault cluster with one active node, two standby nodes and a Consul agent sidecar deployed talking on behalf of the Vault node to the five-node Consul server cluster. The architecture can also be extended to a multi-availability zone, rendering your cluster to be highly fault-tolerant.
We were able to check most if not all of our requirements by using vault and consul. Here are list of some key objectives that we achieved:
Implementing a secret management solution in a platform can be a hard problem since the changes have to be propagated at many levels ranging from application to infrastructure. However, having the right set of tools can lay the groundwork and make the transition easier. Here are some of the lessons we learned while coming up with our existing architecture:
There are many great tools out there for secret management. Every project has a different set of requirements and objectives, at the end of day the choice of the tool will depend on your use-case. If you are already tied to a particular cloud provider and don't have much need for flexibility in terms of configuring your secret store then using a secret manager from that cloud provider might be a good option. However, if you are looking for a more generic and configurable secret store with robust and well proven design then Vault might be the way to go.
We will be talking more about Kubernetes and Infrastructure Deployments in future posts. Reach out to us on Twitter about this article or topics you'd like us to cover @bikramnehra and @ArcheraAi!
I kept thinking “we have heard this cost visibility, cloud tagging and attribution story one too many times.” For me, the game changing moment was when Aran began talking about reducing risk, proactive planning, and creating a secondary marketplace.