Multi-Customer Support with Dedicated Kubernetes Deployments with Isolated Channels

4 min readJan 29, 2025

Problem Statement

Organizations providing multi-customer support often face challenges in ensuring security, resource isolation, and scalability. A shared Kubernetes environment can lead to resource contention, security vulnerabilities, and operational complexities when managing multiple tenants.

To address these issues, a solution is needed where each customer has a dedicated Kubernetes deployment with isolated communication channels. This approach ensures enhanced security, performance optimization, and compliance with data privacy regulations while maintaining operational efficiency. However, managing multiple deployments introduces challenges in automation, cost optimization, and lifecycle management.

How can organizations efficiently deploy and maintain dedicated Kubernetes environments for each customer while ensuring seamless support operations and minimizing infrastructure overhead?

This article aims to address this challenge using AWS and Kubernetes, specifically with AWS EKS. To illustrate the solution, we will consider an example of a product that utilizes an RDS database, EFS for file storage, and Redis ElastiCache for caching, with API services deployed within EKS.

How Can Customers Be Isolated?

This article will not cover the approach of creating dedicated VPCs or EKS clusters. Instead, the focus will be on achieving customer isolation within the same EKS cluster and VPC while ensuring security and operational efficiency. The suggested approach for deploying services is to create a dedicated namespace for each customer within the same Kubernetes cluster. This ensures that each customer’s services are logically isolated, allowing for better security, resource allocation, and easier management of their respective workloads, all while staying within the same EKS cluster and VPC.

Solution 1

First approach is to use EKS Pod Security Groups [1]. This feature, backed by the EKS Pod ENI capabilities, ensures that each pod is assigned a unique Elastic Network Interface (ENI) and IP address. Each pod can then be assigned to a specific security group based on its ENI. This allows us to tightly control access to the data components, ensuring that they are only accessible via the defined security groups, and only by the appropriate pods.

Problems can occur down the line.

If your workloads run as jobs that frequently create and delete pods in response to traffic, you might encounter delays in pod scheduling due to IP assignment issues. This can lead to pods remaining in a “ContainerCreating” state for several minutes while waiting for an IP. You may also see a warning message in the pod describing this delay.

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1ef50790f21342bbcd88fc5456b480119e162ca3755f05b0bb5ac4a7432f1fb8": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container"

The reason for this delay is how the AWS VPC CNI operates when the POD ENI feature is enabled. When a pod is terminated, the CNI does not immediately release its allocated IPs. Instead, it holds them for approximately 60 seconds before making them available again. As a result, new pods may have to wait for previously assigned IPs to be released, causing delays in pod startup.

Pod Security Groups are not well-suited for workloads with frequent pod creation and deletion. Since IPs are dynamically associated with pods, the continuous provisioning and termination of pods can lead to delays impacting performance and reliability.

Alternatives for Using Security Groups with Frequent Pod Creation

If you want to continue using Security Groups while addressing the challenges of frequent pod creation and deletion, AWS Fargate can be a viable solution. Fargate eliminates the limitations in the number of ENIs that can be used and allows you to run jobs efficiently with Security Groups.

Possible Approaches:

  1. AWS Batch with Fargate and ECS — Leverage AWS Batch to run jobs on Fargate-backed ECS, which automatically scales and manages compute resources while enforcing security group rules.
  2. Fargate with EKS — Run Kubernetes workloads on EKS with Fargate, ensuring that each pod runs in an isolated environment with dedicated security groups, avoiding IP allocation delays.

These options help maintain security group-based network isolation while improving scalability and reducing the overhead of managing pod IP assignments.

Solution 2

Next option is to combine dedicated data subnets with Kubernetes Network Policies. In this approach, each customer is assigned a dedicated data subnet within the VPC. Then, within each customer’s namespace, network policies are applied to restrict egress traffic exclusively to that customer’s specific data CIDR range. This ensures that the traffic between customer services is isolated, providing additional layers of security and control over data access.

In this diagram, I am illustrating the setup for a single availability zone (AZ). For a multi-AZ deployment, this architecture needs to be replicated across all the availability zones in use. This ensures high availability and fault tolerance by distributing the services and data across multiple AZs, providing resilience in case of an AZ failure.

Conclusion

For workloads that function as jobs, Pod Security Groups may not be the best approach to isolate communication channels. Using separate subnets for external components allows for easier isolation of communication channels with Kubernetes Network Policies, while still deploying services within the same cluster. This setup is more cost-effective and secure for this type of solution.

References

[1] https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html

[2] https://keda.sh/

[3] https://docs.aws.amazon.com/batch/latest/userguide/fargate.html

[4] https://docs.aws.amazon.com/eks/latest/userguide/fargate.html

--

--

No responses yet