Local clusters
How do you build a solution that has not just one, but three Kubernetes clusters, with each one provisioned in a different cloud provider?
This series is about building a three-cluster Kubernetes platform: one cluster in AWS, one in GCP, and one in Azure, operated as a coherent system.
By the end, the platform should have:
- three managed Kubernetes clusters: EKS, GKE, and AKS
- reproducible infrastructure
- GitOps-based delivery for platform components and applications
- the same sample application deployed into all three clouds
- a single public entry point with health-aware traffic routing
- shared observability across clusters: metrics, logs, and traces
- secrets managed outside the repository and synced into Kubernetes
- basic policy enforcement for unsafe or incomplete workloads
- a documented failure test, rollback path, and operating runbook
The beginning #
The first decision is to make the clusters boring.
That matters because the useful shape is one managed cluster per provider: EKS in AWS, GKE in Google Cloud, and AKS in Azure. Each provider keeps its own control plane, networking model, identity system, and operational boundary. The platform work sits above those boundaries and makes deployment, routing, observability, secrets, and recovery feel consistent.
Each cluster should be able to stand on its own. It should have its own networking, its own ingress path, its own credentials, and enough platform services to keep running if another provider is having a bad day. The operating model is the shared layer: the same way to provision, deploy, observe, secure, and recover.
So the platform starts as three separate Kubernetes clusters that are made consistent from the outside. Infrastructure is described in code. Later, cluster configuration will be reconciled from Git. Applications should be packaged the same way in each environment, with cloud-specific differences captured in overlays or values files.
That gives the goal. Three ordinary clusters should feel like one platform when you deploy, observe, route traffic, rotate secrets, and recover from failure.
The first slice, then, is the foundation: create the three managed clusters and prove that a small application can run in each of them.
The target: three kubeconfigs and one service responding from each cloud.
Phase 1 - Foundations #
There are multiple tools for infrastructure as code. I wanted one workflow across all three providers, so I chose Pulumi.
The first version proves one thing: I can create one managed Kubernetes cluster in each cloud from a clean checkout. AWS, Google Cloud, and Azure expose different knobs, defaults, and naming conventions, so each cloud gets its own entry point. The repo should still make the three builds feel related.
The initial shape is small:
infra/
pulumi/
aws/
gcp/
azure/
components/
The cloud folders hold the stack entry points. The shared components folder is where common naming, tags, Kubernetes provider setup, and small reusable pieces can live once repetition appears.
For now Pulumi should stay close to the provider APIs. EKS, GKE, and AKS have different opinions about node pools, identity, networking, and cluster access. Keeping those differences visible makes the code easier to trust. The first pass should be explicit enough that I can read the AWS stack and see AWS decisions, then read the Google Cloud and Azure stacks and see the same platform shape expressed in their terms.
The exit criteria for this phase are deliberately plain:
- one EKS cluster
- one GKE cluster
- one AKS cluster
- kubeconfig access for each cluster
- one small service responding from each cloud
Pulumi project setup #
TypeScript for the Pulumi code. The provider support is good, the examples are easy to read, and the project structure stays light. I also want normal programming language tools once the platform starts to grow: functions for naming, small shared components, typed config, and tests around the pieces that become reusable.
One Pulumi project per cloud to start with:
infra/
pulumi/
aws/
Pulumi.yaml
Pulumi.dev.yaml
index.ts
gcp/
Pulumi.yaml
Pulumi.dev.yaml
index.ts
azure/
Pulumi.yaml
Pulumi.dev.yaml
index.ts
components/
config.ts
hello.ts
naming.ts
Each project owns the provider setup for its cloud. The AWS project configures AWS and creates EKS. The Google Cloud project configures GCP and creates GKE. The Azure project configures Azure Native and creates AKS. That keeps the first pass readable.
The components folder starts small. The first shared pieces are config, naming, and the tiny hello application. Config keeps the stack inputs consistent. Naming keeps resource names, tags, and labels predictable. The hello component gives each cluster the same Kubernetes proof: a namespace, an nginx deployment, and a public LoadBalancer service.
Conceptually, the workload is just this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello
namespace: hello
spec:
replicas: 1
selector:
matchLabels:
app: hello
template:
metadata:
labels:
app: hello
spec:
containers:
- name: hello
image: nginx:1.27-alpine
ports:
- name: http
containerPort: 80
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 250m
memory: 128Mi
And a public service in front of it:
apiVersion: v1
kind: Service
metadata:
name: hello
namespace: hello
spec:
type: LoadBalancer
selector:
app: hello
ports:
- name: http
port: 80
targetPort: http
Pulumi creates those Kubernetes objects through the shared hello component. The point is to prove the cluster and cloud load balancer path with the smallest useful workload.
The TypeScript config also needs to stay current with the toolchain. TypeScript 6 warns on the old moduleResolution: "Node" setting, so the project uses the Node 16 resolver and module mode. That keeps npm run check clean on the current compiler.
The first pulumi up in each folder should create the smallest viable cluster for that provider and deploy the same small service into it. Autoscaling, private endpoints, workload identity, and ingress come later. At this point the cluster needs to exist, accept a kubeconfig, schedule a pod, and put a cloud load balancer in front of it.
Naming and stack config #
Before creating cloud resources, the names should settle down.
Cloud examples often start with inline strings. Across three providers, a small vocabulary makes the code easier to read: application name, environment, cloud, region, and component.
For this platform the application name is trinity. The first environment is dev. Each cloud stack carries its own region and Kubernetes version:
# infra/pulumi/aws/Pulumi.dev.yaml
config:
trinity:environment: dev
trinity:region: us-east-1
trinity:kubernetesVersion: "1.33"
# infra/pulumi/gcp/Pulumi.dev.yaml
config:
trinity:environment: dev
trinity:region: us-central1
trinity:kubernetesVersion: "1.33"
# infra/pulumi/azure/Pulumi.dev.yaml
config:
trinity:environment: dev
trinity:region: eastus
trinity:kubernetesVersion: "1.33"
The naming helper takes those pieces and produces predictable names:
export type Cloud = "aws" | "gcp" | "azure";
export function resourceName(
cloud: Cloud,
environment: string,
component: string,
) {
return `trinity-${environment}-${cloud}-${component}`;
}
That gives names like trinity-dev-aws-cluster, trinity-dev-gcp-nodepool, and trinity-dev-azure-rg. They are plain, easy to search for in cloud consoles, and easy to match back to the Pulumi code.
Tags and labels follow the same shape:
export function commonLabels(cloud: Cloud, environment: string) {
return {
app: "trinity",
environment,
cloud,
"managed-by": "pulumi",
};
}
This small helper pays off immediately. Every provider has its own tagging or labeling model, and a shared helper keeps the intent stable while the provider-specific code handles the local syntax.
AWS - first EKS cluster #
The AWS stack stays small and makes the node path explicit:
import * as aws from "@pulumi/aws";
import * as eks from "@pulumi/eks";
import { getTrinityConfig } from "../components/config";
import { deployHelloApp } from "../components/hello";
import { commonLabels, resourceName } from "../components/naming";
const { environment, region, kubernetesVersion } = getTrinityConfig();
const provider = new aws.Provider("aws-provider", {
region: region as aws.Region,
});
const clusterName = resourceName("aws", environment, "cluster");
const labels = commonLabels("aws", environment);
const eksSupportedAvailabilityZones =
region === "us-east-1"
? ["us-east-1a", "us-east-1b", "us-east-1c", "us-east-1d", "us-east-1f"]
: undefined;
const defaultVpc = aws.ec2.getVpc({ default: true }, { provider });
const subnetIds = eksSupportedAvailabilityZones
? defaultVpc.then((vpc) =>
aws.ec2
.getSubnets(
{
filters: [
{ name: "vpc-id", values: [vpc.id] },
{
name: "availability-zone",
values: eksSupportedAvailabilityZones,
},
],
},
{ provider },
)
.then((subnets) => subnets.ids),
)
: undefined;
const nodeRole = new aws.iam.Role(
`${clusterName}-node-role`,
{
assumeRolePolicy: JSON.stringify({
Version: "2012-10-17",
Statement: [
{
Action: "sts:AssumeRole",
Effect: "Allow",
Principal: { Service: "ec2.amazonaws.com" },
},
],
}),
tags: labels,
},
{ provider },
);
const cluster = new eks.Cluster(clusterName, {
name: clusterName,
version: kubernetesVersion,
subnetIds,
skipDefaultNodeGroup: true,
instanceRoles: [nodeRole],
tags: labels,
}, { providers: { aws: provider } });
const nodeRolePolicyAttachments = [
"arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
"arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
"arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
].map(
(policyArn) =>
new aws.iam.RolePolicyAttachment(
`${clusterName}-${policyArn.split("/").pop()}`,
{
role: nodeRole.name,
policyArn,
},
{ provider },
),
);
const nodeGroup = new eks.ManagedNodeGroup(
`${clusterName}-nodes`,
{
cluster,
nodeRole,
subnetIds,
instanceTypes: ["t3.medium"],
scalingConfig: {
desiredSize: 2,
minSize: 1,
maxSize: 3,
},
tags: labels,
},
{
dependsOn: nodeRolePolicyAttachments,
providers: { aws: provider },
},
);
export const name = cluster.eksCluster.name;
export const kubeconfig = cluster.kubeconfig;
export const nodeGroupName = nodeGroup.nodeGroup.nodeGroupName;
const hello = deployHelloApp("aws", environment, cluster.provider, [
cluster,
nodeGroup,
]);
export const helloNamespace = hello.namespace;
export const helloServiceName = hello.serviceName;
export const helloServiceIp = hello.serviceIp;
export const helloServiceHostname = hello.serviceHostname;
EKS has enough moving parts to test the project shape. Even a small cluster needs networking, IAM, a control plane, worker nodes, and kubeconfig output. Pulumi's EKS package gives me a useful first pass, and the AWS stack still needs a few explicit decisions.
The first attempt used the default Pulumi EKS worker node group. That got as far as the control plane and exposed two AWS-specific edges.
EKS in us-east-1 rejected one of the default availability zones selected for the control plane. The failed zone was us-east-1e. The fix is to pass explicit default-VPC subnets filtered to the availability zones EKS supports in that region.
The default self-managed Auto Scaling Group path failed launching worker nodes. The stack now creates an AWS managed node group directly. The node IAM role is explicit, its standard EKS policies are attached, and the role is registered with the cluster through instanceRoles, which Pulumi EKS requires before creating a managed node group.
t3.mediumis the first node size.
A t3.small is tempting for cost. In practice, 2 GiB of memory disappears quickly once Kubernetes system pods, the AWS CNI, DNS, and basic platform agents are running. t3.medium keeps the cluster small while leaving enough headroom for the next few phases.
This creates an EKS control plane, an AWS managed node group, exports the kubeconfig, and deploys the shared hello service. It is enough for the first proof: run pulumi up, write the kubeconfig to disk, and ask Kubernetes what came back.
pulumi -C infra/pulumi/aws stack output kubeconfig --show-secrets > kubeconfig.aws.yaml
KUBECONFIG=./kubeconfig.aws.yaml kubectl get nodes
The first successful output was boring in the right way:
NAME STATUS ROLES AGE VERSION
ip-172-31-22-77.ec2.internal Ready <none> 20m v1.33.10-eks-bbe087e
ip-172-31-38-248.ec2.internal Ready <none> 20m v1.33.10-eks-bbe087e
Then the Pulumi-managed Kubernetes resources showed up in the hello namespace:
KUBECONFIG=./kubeconfig.aws.yaml kubectl -n hello get deployment,service,pods
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/trinity-dev-aws-hello-015f656a 1/1 1 1 20m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/trinity-dev-aws-hello-3150362b LoadBalancer 10.100.21.46 a0121e62eb08b431383f417fa4dc14e9-1244336922.us-east-1.elb.amazonaws.com 80:30961/TCP 20m
NAME READY STATUS RESTARTS AGE
pod/trinity-dev-aws-hello-015f656a-5f5fddb5f8-t86wd 1/1 Running 0 20m
The final check is outside Kubernetes, through the AWS load balancer:
curl -I "http://$(pulumi -C infra/pulumi/aws stack output helloServiceHostname)"
HTTP/1.1 200 OK
Server: nginx/1.27.5
Content-Type: text/html
Content-Length: 615
For Phase 1 that is the concrete signal I wanted from the AWS leg of the platform: Pulumi can create the cluster, Kubernetes accepts workloads, AWS can provision worker nodes, and AWS can put a public endpoint in front of a pod.
GCP - first GKE cluster #
The Google Cloud version should prove the same foundation through GKE's own model. GKE has its own defaults, especially around node pools, cluster access, and release channels. For the first pass I want explicit control over the node pool, because the node pool is where sizing, labels, and later workload settings will live.
The first GCP stack has the same shape: read the shared config, create the cluster, create a small node pool, and export a kubeconfig.
import * as gcp from "@pulumi/gcp";
import * as k8s from "@pulumi/kubernetes";
import * as pulumi from "@pulumi/pulumi";
import { getTrinityConfig } from "../components/config";
import { deployHelloApp } from "../components/hello";
import { commonLabels, resourceName } from "../components/naming";
const { environment, region, kubernetesVersion } = getTrinityConfig();
const gcpConfig = new pulumi.Config("gcp");
const project = gcpConfig.require("project");
const provider = new gcp.Provider("gcp-provider", {
project,
region,
});
const clusterName = resourceName("gcp", environment, "cluster");
const nodePoolName = resourceName("gcp", environment, "nodepool");
const labels = commonLabels("gcp", environment);
const nodeLocations = region === "us-central1" ? ["us-central1-a"] : undefined;
const cluster = new gcp.container.Cluster(
clusterName,
{
name: clusterName,
location: region,
nodeLocations,
minMasterVersion: kubernetesVersion,
initialNodeCount: 1,
removeDefaultNodePool: true,
nodeConfig: {
diskSizeGb: 15,
diskType: "pd-standard",
},
deletionProtection: false,
resourceLabels: labels,
},
{ provider },
);
const nodePool = new gcp.container.NodePool(
nodePoolName,
{
name: nodePoolName,
cluster: cluster.name,
location: cluster.location,
nodeCount: 1,
nodeConfig: {
machineType: "e2-standard-2",
diskSizeGb: 20,
diskType: "pd-standard",
labels,
oauthScopes: ["https://www.googleapis.com/auth/cloud-platform"],
},
},
{ provider },
);
export const name = cluster.name;
export const nodePoolNameOutput = nodePool.name;
const clusterInfo = gcp.container.getClusterOutput(
{
name: cluster.name,
location: cluster.location,
project,
},
{
dependsOn: [cluster],
provider,
},
);
export const kubeconfig = pulumi
.all([clusterInfo.name, clusterInfo.endpoint, clusterInfo.masterAuths])
.apply(([name, endpoint, masterAuths]) => `apiVersion: v1
clusters:
- cluster:
certificate-authority-data: ${masterAuths[0].clusterCaCertificate}
server: https://${endpoint}
name: ${name}
contexts:
- context:
cluster: ${name}
user: ${name}
name: ${name}
current-context: ${name}
kind: Config
users:
- name: ${name}
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
command: gke-gcloud-auth-plugin
installHint: Install gke-gcloud-auth-plugin for kubectl authentication.
provideClusterInfo: true
`);
There are a few deliberate choices here. The explicit provider includes project; the node pool preview started working after that value moved into the provider. The default node pool is removed so the real node pool is visible in code. GKE still creates an initial node while building the cluster, and that temporary node uses a 15 GB standard persistent disk so it is large enough for the GKE node image while staying within regional SSD quota. The regional cluster is also limited to one worker zone for this setup; a regional node pool creates nodes in multiple zones by default. The real node pool uses one e2-standard-2 node with a 20 GB standard disk, which is small and gives the cluster enough CPU and memory for Kubernetes system workloads plus the early platform agents. Deletion protection is disabled for this setup so the cluster can be torn down cleanly while iterating.
The kubeconfig uses the gke-gcloud-auth-plugin, so local access goes through Google Cloud credentials and the auth plugin. That works for this phase. Later, CI and GitOps access need their own identity story.
The stack then creates a Kubernetes provider from that kubeconfig and uses the shared hello component:
const k8sProvider = new k8s.Provider(`${clusterName}-k8s-provider`, {
kubeconfig,
});
const hello = deployHelloApp("gcp", environment, k8sProvider, [nodePool]);
export const helloNamespace = hello.namespace;
export const helloServiceName = hello.serviceName;
export const helloServiceIp = hello.serviceIp;
export const helloServiceHostname = hello.serviceHostname;
Validation mirrors AWS through Pulumi outputs and kubeconfig checks:
pulumi -C infra/pulumi/gcp stack output kubeconfig --show-secrets > kubeconfig.gcp.yaml
KUBECONFIG=./kubeconfig.gcp.yaml kubectl get nodes
KUBECONFIG=./kubeconfig.gcp.yaml kubectl -n hello get deployment,service,pods
The stack outputs showed the GCP shape:
helloNamespace hello
helloServiceIp 34.66.142.73
helloServiceName trinity-dev-gcp-hello-60c4e021
name trinity-dev-gcp-cluster
nodePoolNameOutput trinity-dev-gcp-nodepool
The node came up cleanly:
NAME STATUS ROLES AGE VERSION
gke-trinity-dev-gcp--trinity-dev-gcp--c5feb9c8-mxjt Ready <none> 8m44s v1.33.9-gke.1060000
The hello resources landed in Kubernetes:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/trinity-dev-gcp-hello-a48e4a9a 1/1 1 1 8m52s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/trinity-dev-gcp-hello-60c4e021 LoadBalancer 34.118.228.59 34.66.142.73 80:30812/TCP 8m32s
NAME READY STATUS RESTARTS AGE
pod/trinity-dev-gcp-hello-a48e4a9a-5884c8c544-62szg 1/1 Running 0 8m52s
And the public IP responded:
curl -I "http://$(pulumi -C infra/pulumi/gcp stack output helloServiceIp)"
HTTP/1.1 200 OK
Server: nginx/1.27.5
Content-Type: text/html
Content-Length: 615
The result is just as dull as the AWS result: a ready node and a public load balancer in front of a pod. The important part is that the GCP cluster is provisioned from the same repo, named the same way, labeled the same way, and validated with the same operational motion.
Azure - first AKS cluster #
AKS completes the triangle and adds the third cloud's opinions. Azure wants a resource group boundary, AKS has its own managed identity model, and node pools are expressed through agent pool profiles. Again, the goal is to make those differences readable.
The first Azure stack creates a resource group and a small AKS cluster:
import * as containerservice from "@pulumi/azure-native/containerservice";
import * as k8s from "@pulumi/kubernetes";
import * as resources from "@pulumi/azure-native/resources";
import { getTrinityConfig } from "../components/config";
import { deployHelloApp } from "../components/hello";
import { commonLabels, resourceName } from "../components/naming";
const { environment, region, kubernetesVersion } = getTrinityConfig();
const resourceGroupName = resourceName("azure", environment, "rg");
const clusterName = resourceName("azure", environment, "cluster");
const labels = commonLabels("azure", environment);
const resourceGroup = new resources.ResourceGroup(resourceGroupName, {
resourceGroupName,
location: region,
tags: labels,
});
const cluster = new containerservice.ManagedCluster(clusterName, {
resourceGroupName: resourceGroup.name,
resourceName: clusterName,
location: resourceGroup.location,
dnsPrefix: clusterName,
kubernetesVersion,
identity: {
type: containerservice.ResourceIdentityType.SystemAssigned,
},
agentPoolProfiles: [
{
name: "systempool",
count: 2,
vmSize: "Standard_B2s",
mode: "System",
},
],
tags: labels,
});
const credentials =
containerservice.listManagedClusterUserCredentialsOutput({
resourceGroupName: resourceGroup.name,
resourceName: cluster.name,
});
export const kubeconfig = credentials.kubeconfigs.apply((kubeconfigs) =>
Buffer.from(kubeconfigs[0].value, "base64").toString(),
);
The resource group is part of the platform foundation. It gives the Azure leg a clear lifecycle boundary. When I destroy the Azure stack, I want to know which resources belong to this platform and which resources belong elsewhere.
Standard_B2s keeps the first cluster cheap enough to run while still leaving room for the same basic workload. The system-assigned identity is enough for the first cluster to exist. Later phases can tighten this up with workload identity, Key Vault access, and more explicit role assignments.
The Azure stack also hands the kubeconfig to the Kubernetes provider and deploys the shared hello resources:
const k8sProvider = new k8s.Provider(`${clusterName}-k8s-provider`, {
kubeconfig,
});
const hello = deployHelloApp("azure", environment, k8sProvider, [cluster]);
export const helloNamespace = hello.namespace;
export const helloServiceName = hello.serviceName;
export const helloServiceIp = hello.serviceIp;
export const helloServiceHostname = hello.serviceHostname;
Validation is the same again:
pulumi -C infra/pulumi/azure stack output kubeconfig --show-secrets > kubeconfig.azure.yaml
KUBECONFIG=./kubeconfig.azure.yaml kubectl get nodes
KUBECONFIG=./kubeconfig.azure.yaml kubectl -n hello get deployment,service,pods
The stack outputs showed the Azure shape:
helloNamespace hello
helloServiceIp 40.88.218.90
helloServiceName trinity-dev-azure-hello-164d5ea9
name trinity-dev-azure-cluster
resourceGroupNameOutput trinity-dev-azure-rg
The two AKS nodes came up ready:
NAME STATUS ROLES AGE VERSION
aks-systempool-19386533-vmss000000 Ready <none> 6m39s v1.33.10
aks-systempool-19386533-vmss000001 Ready <none> 6m30s v1.33.10
The hello namespace had the same shape as the other clouds: one deployment, one public load balancer service, and one running pod.
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/trinity-dev-azure-hello-734d06b9 1/1 1 1 5m41s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/trinity-dev-azure-hello-164d5ea9 LoadBalancer 10.0.72.24 40.88.218.90 80:32744/TCP 5m37s
NAME READY STATUS RESTARTS AGE
pod/trinity-dev-azure-hello-734d06b9-6bc55c5b9-bgm28 1/1 Running 0 5m42s
And the Azure load balancer answered through the public IP:
curl -I "http://$(pulumi -C infra/pulumi/azure stack output helloServiceIp)"
HTTP/1.1 200 OK
Server: nginx/1.27.5
Content-Type: text/html
Content-Length: 615
At this point all three clouds answer the same basic question: can I create a managed Kubernetes cluster from code, authenticate to it, schedule a pod, and get traffic to it? Yes.
Exit #
Phase 1 is now complete. It proves the substrate:
- Pulumi can create one managed cluster in each cloud.
- The three stacks share naming and labels while preserving provider-specific details.
- Each cluster has working kubeconfig access.
- Each cluster can schedule the same Pulumi-managed nginx workload.
- Each cloud can expose that workload through its own load balancer.
It is the first checkpoint in the larger platform series: the three Kubernetes foundations exist and can run the same public workload.
For the foundations phase, that means:
- create cloud accounts, projects, and subscriptions
- establish the Pulumi project structure
- provision base Kubernetes clusters
- deploy a simple hello-world app to each cluster
The final validation pass before teardown showed ready worker nodes in each cluster. AWS and Azure had two workers; the GKE node pool for this checkpoint had one worker:
cloud nodes version
aws 2 v1.33.11-eks-7fcd7ec
gcp 1 v1.33.10-gke.1176000
azure 2 v1.33.11
The next phase is GitOps. The Phase 1 application is already managed declaratively by Pulumi. Application and platform add-on delivery should move to a reconciler running in the clusters. That means choosing Argo CD or Flux, moving the hello workload out of Pulumi-managed Kubernetes resources, and giving each cloud an overlay or values file for the differences that should remain explicit.