Local clusters

Publish at:

Multi-cloud foundation flow

How do you build a solution that has not just one, but three Kubernetes clusters, with each one provisioned in a different cloud provider?

This series is about building a three-cluster Kubernetes platform: one cluster in AWS, one in GCP, and one in Azure, operated as a coherent system.

By the end, the platform should have:

  • three managed Kubernetes clusters: EKS, GKE, and AKS
  • reproducible infrastructure
  • GitOps-based delivery for platform components and applications
  • the same sample application deployed into all three clouds
  • a single public entry point with health-aware traffic routing
  • shared observability across clusters: metrics, logs, and traces
  • secrets managed outside the repository and synced into Kubernetes
  • basic policy enforcement for unsafe or incomplete workloads
  • a documented failure test, rollback path, and operating runbook

The beginning #

The first decision is to make the clusters boring.

That matters because the useful shape is one managed cluster per provider: EKS in AWS, GKE in Google Cloud, and AKS in Azure. Each provider keeps its own control plane, networking model, identity system, and operational boundary. The platform work sits above those boundaries and makes deployment, routing, observability, secrets, and recovery feel consistent.

Each cluster should be able to stand on its own. It should have its own networking, its own ingress path, its own credentials, and enough platform services to keep running if another provider is having a bad day. The operating model is the shared layer: the same way to provision, deploy, observe, secure, and recover.

So the platform starts as three separate Kubernetes clusters that are made consistent from the outside. Infrastructure is described in code. Later, cluster configuration will be reconciled from Git. Applications should be packaged the same way in each environment, with cloud-specific differences captured in overlays or values files.

That gives the goal. Three ordinary clusters should feel like one platform when you deploy, observe, route traffic, rotate secrets, and recover from failure.

The first slice, then, is the foundation: create the three managed clusters and prove that a small application can run in each of them.

The target: three kubeconfigs and one service responding from each cloud.

Phase 1 - Foundations #

There are multiple tools for infrastructure as code. I wanted one workflow across all three providers, so I chose Pulumi.

The first version proves one thing: I can create one managed Kubernetes cluster in each cloud from a clean checkout. AWS, Google Cloud, and Azure expose different knobs, defaults, and naming conventions, so each cloud gets its own entry point. The repo should still make the three builds feel related.

The initial shape is small:

infra/
  pulumi/
    aws/
    gcp/
    azure/
    components/

The cloud folders hold the stack entry points. The shared components folder is where common naming, tags, Kubernetes provider setup, and small reusable pieces can live once repetition appears.

For now Pulumi should stay close to the provider APIs. EKS, GKE, and AKS have different opinions about node pools, identity, networking, and cluster access. Keeping those differences visible makes the code easier to trust. The first pass should be explicit enough that I can read the AWS stack and see AWS decisions, then read the Google Cloud and Azure stacks and see the same platform shape expressed in their terms.

The exit criteria for this phase are deliberately plain:

  • one EKS cluster
  • one GKE cluster
  • one AKS cluster
  • kubeconfig access for each cluster
  • one small service responding from each cloud

Pulumi project setup #

TypeScript for the Pulumi code. The provider support is good, the examples are easy to read, and the project structure stays light. I also want normal programming language tools once the platform starts to grow: functions for naming, small shared components, typed config, and tests around the pieces that become reusable.

One Pulumi project per cloud to start with:

infra/
  pulumi/
    aws/
      Pulumi.yaml
      Pulumi.dev.yaml
      index.ts
    gcp/
      Pulumi.yaml
      Pulumi.dev.yaml
      index.ts
    azure/
      Pulumi.yaml
      Pulumi.dev.yaml
      index.ts
    components/
      config.ts
      hello.ts
      naming.ts

Each project owns the provider setup for its cloud. The AWS project configures AWS and creates EKS. The Google Cloud project configures GCP and creates GKE. The Azure project configures Azure Native and creates AKS. That keeps the first pass readable.

The components folder starts small. The first shared pieces are config, naming, and the tiny hello application. Config keeps the stack inputs consistent. Naming keeps resource names, tags, and labels predictable. The hello component gives each cluster the same Kubernetes proof: a namespace, an nginx deployment, and a public LoadBalancer service.

Conceptually, the workload is just this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello
  namespace: hello
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
        - name: hello
          image: nginx:1.27-alpine
          ports:
            - name: http
              containerPort: 80
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 250m
              memory: 128Mi

And a public service in front of it:

apiVersion: v1
kind: Service
metadata:
  name: hello
  namespace: hello
spec:
  type: LoadBalancer
  selector:
    app: hello
  ports:
    - name: http
      port: 80
      targetPort: http

Pulumi creates those Kubernetes objects through the shared hello component. The point is to prove the cluster and cloud load balancer path with the smallest useful workload.

The TypeScript config also needs to stay current with the toolchain. TypeScript 6 warns on the old moduleResolution: "Node" setting, so the project uses the Node 16 resolver and module mode. That keeps npm run check clean on the current compiler.

The first pulumi up in each folder should create the smallest viable cluster for that provider and deploy the same small service into it. Autoscaling, private endpoints, workload identity, and ingress come later. At this point the cluster needs to exist, accept a kubeconfig, schedule a pod, and put a cloud load balancer in front of it.

Naming and stack config #

Before creating cloud resources, the names should settle down.

Cloud examples often start with inline strings. Across three providers, a small vocabulary makes the code easier to read: application name, environment, cloud, region, and component.

For this platform the application name is trinity. The first environment is dev. Each cloud stack carries its own region and Kubernetes version:

# infra/pulumi/aws/Pulumi.dev.yaml
config:
  trinity:environment: dev
  trinity:region: us-east-1
  trinity:kubernetesVersion: "1.33"
# infra/pulumi/gcp/Pulumi.dev.yaml
config:
  trinity:environment: dev
  trinity:region: us-central1
  trinity:kubernetesVersion: "1.33"
# infra/pulumi/azure/Pulumi.dev.yaml
config:
  trinity:environment: dev
  trinity:region: eastus
  trinity:kubernetesVersion: "1.33"

The naming helper takes those pieces and produces predictable names:

export type Cloud = "aws" | "gcp" | "azure";

export function resourceName(
  cloud: Cloud,
  environment: string,
  component: string,
) {
  return `trinity-${environment}-${cloud}-${component}`;
}

That gives names like trinity-dev-aws-cluster, trinity-dev-gcp-nodepool, and trinity-dev-azure-rg. They are plain, easy to search for in cloud consoles, and easy to match back to the Pulumi code.

Tags and labels follow the same shape:

export function commonLabels(cloud: Cloud, environment: string) {
  return {
    app: "trinity",
    environment,
    cloud,
    "managed-by": "pulumi",
  };
}

This small helper pays off immediately. Every provider has its own tagging or labeling model, and a shared helper keeps the intent stable while the provider-specific code handles the local syntax.

AWS - first EKS cluster #

The AWS stack stays small and makes the node path explicit:

EKS has enough moving parts to test the project shape. Even a small cluster needs networking, IAM, a control plane, worker nodes, and kubeconfig output. Pulumi's EKS package gives me a useful first pass, and the AWS stack still needs a few explicit decisions.

The first attempt used the default Pulumi EKS worker node group. That got as far as the control plane and exposed two AWS-specific edges.

EKS in us-east-1 rejected one of the default availability zones selected for the control plane. The failed zone was us-east-1e. The fix is to pass explicit default-VPC subnets filtered to the availability zones EKS supports in that region.

The default self-managed Auto Scaling Group path failed launching worker nodes. The stack now creates an AWS managed node group directly. The node IAM role is explicit, its standard EKS policies are attached, and the role is registered with the cluster through instanceRoles, which Pulumi EKS requires before creating a managed node group.

t3.medium is the first node size.

A t3.small is tempting for cost. In practice, 2 GiB of memory disappears quickly once Kubernetes system pods, the AWS CNI, DNS, and basic platform agents are running. t3.medium keeps the cluster small while leaving enough headroom for the next few phases.

This creates an EKS control plane, an AWS managed node group, exports the kubeconfig, and deploys the shared hello service. It is enough for the first proof: run pulumi up, write the kubeconfig to disk, and ask Kubernetes what came back.

pulumi -C infra/pulumi/aws stack output kubeconfig --show-secrets > kubeconfig.aws.yaml
KUBECONFIG=./kubeconfig.aws.yaml kubectl get nodes

The first successful output was boring in the right way:

NAME                            STATUS   ROLES    AGE   VERSION
ip-172-31-22-77.ec2.internal    Ready    <none>   20m   v1.33.10-eks-bbe087e
ip-172-31-38-248.ec2.internal   Ready    <none>   20m   v1.33.10-eks-bbe087e

Then the Pulumi-managed Kubernetes resources showed up in the hello namespace:

KUBECONFIG=./kubeconfig.aws.yaml kubectl -n hello get deployment,service,pods
NAME                                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/trinity-dev-aws-hello-015f656a   1/1     1            1           20m

NAME                                     TYPE           CLUSTER-IP     EXTERNAL-IP                                                               PORT(S)        AGE
service/trinity-dev-aws-hello-3150362b   LoadBalancer   10.100.21.46   a0121e62eb08b431383f417fa4dc14e9-1244336922.us-east-1.elb.amazonaws.com   80:30961/TCP   20m

NAME                                                  READY   STATUS    RESTARTS   AGE
pod/trinity-dev-aws-hello-015f656a-5f5fddb5f8-t86wd   1/1     Running   0          20m

The final check is outside Kubernetes, through the AWS load balancer:

curl -I "http://$(pulumi -C infra/pulumi/aws stack output helloServiceHostname)"
HTTP/1.1 200 OK
Server: nginx/1.27.5
Content-Type: text/html
Content-Length: 615

For Phase 1 that is the concrete signal I wanted from the AWS leg of the platform: Pulumi can create the cluster, Kubernetes accepts workloads, AWS can provision worker nodes, and AWS can put a public endpoint in front of a pod.

GCP - first GKE cluster #

The Google Cloud version should prove the same foundation through GKE's own model. GKE has its own defaults, especially around node pools, cluster access, and release channels. For the first pass I want explicit control over the node pool, because the node pool is where sizing, labels, and later workload settings will live.

The first GCP stack has the same shape: read the shared config, create the cluster, create a small node pool, and export a kubeconfig.

There are a few deliberate choices here. The explicit provider includes project; the node pool preview started working after that value moved into the provider. The default node pool is removed so the real node pool is visible in code. GKE still creates an initial node while building the cluster, and that temporary node uses a 15 GB standard persistent disk so it is large enough for the GKE node image while staying within regional SSD quota. The regional cluster is also limited to one worker zone for this setup; a regional node pool creates nodes in multiple zones by default. The real node pool uses one e2-standard-2 node with a 20 GB standard disk, which is small and gives the cluster enough CPU and memory for Kubernetes system workloads plus the early platform agents. Deletion protection is disabled for this setup so the cluster can be torn down cleanly while iterating.

The kubeconfig uses the gke-gcloud-auth-plugin, so local access goes through Google Cloud credentials and the auth plugin. That works for this phase. Later, CI and GitOps access need their own identity story.

The stack then creates a Kubernetes provider from that kubeconfig and uses the shared hello component:

const k8sProvider = new k8s.Provider(`${clusterName}-k8s-provider`, {
  kubeconfig,
});

const hello = deployHelloApp("gcp", environment, k8sProvider, [nodePool]);

export const helloNamespace = hello.namespace;
export const helloServiceName = hello.serviceName;
export const helloServiceIp = hello.serviceIp;
export const helloServiceHostname = hello.serviceHostname;

Validation mirrors AWS through Pulumi outputs and kubeconfig checks:

pulumi -C infra/pulumi/gcp stack output kubeconfig --show-secrets > kubeconfig.gcp.yaml
KUBECONFIG=./kubeconfig.gcp.yaml kubectl get nodes
KUBECONFIG=./kubeconfig.gcp.yaml kubectl -n hello get deployment,service,pods

The stack outputs showed the GCP shape:

helloNamespace      hello
helloServiceIp      34.66.142.73
helloServiceName    trinity-dev-gcp-hello-60c4e021
name                trinity-dev-gcp-cluster
nodePoolNameOutput  trinity-dev-gcp-nodepool

The node came up cleanly:

NAME                                                  STATUS   ROLES    AGE     VERSION
gke-trinity-dev-gcp--trinity-dev-gcp--c5feb9c8-mxjt   Ready    <none>   8m44s   v1.33.9-gke.1060000

The hello resources landed in Kubernetes:

NAME                                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/trinity-dev-gcp-hello-a48e4a9a   1/1     1            1           8m52s

NAME                                     TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
service/trinity-dev-gcp-hello-60c4e021   LoadBalancer   34.118.228.59   34.66.142.73   80:30812/TCP   8m32s

NAME                                                  READY   STATUS    RESTARTS   AGE
pod/trinity-dev-gcp-hello-a48e4a9a-5884c8c544-62szg   1/1     Running   0          8m52s

And the public IP responded:

curl -I "http://$(pulumi -C infra/pulumi/gcp stack output helloServiceIp)"
HTTP/1.1 200 OK
Server: nginx/1.27.5
Content-Type: text/html
Content-Length: 615

The result is just as dull as the AWS result: a ready node and a public load balancer in front of a pod. The important part is that the GCP cluster is provisioned from the same repo, named the same way, labeled the same way, and validated with the same operational motion.

Azure - first AKS cluster #

AKS completes the triangle and adds the third cloud's opinions. Azure wants a resource group boundary, AKS has its own managed identity model, and node pools are expressed through agent pool profiles. Again, the goal is to make those differences readable.

The first Azure stack creates a resource group and a small AKS cluster:

The resource group is part of the platform foundation. It gives the Azure leg a clear lifecycle boundary. When I destroy the Azure stack, I want to know which resources belong to this platform and which resources belong elsewhere.

Standard_B2s keeps the first cluster cheap enough to run while still leaving room for the same basic workload. The system-assigned identity is enough for the first cluster to exist. Later phases can tighten this up with workload identity, Key Vault access, and more explicit role assignments.

The Azure stack also hands the kubeconfig to the Kubernetes provider and deploys the shared hello resources:

const k8sProvider = new k8s.Provider(`${clusterName}-k8s-provider`, {
  kubeconfig,
});

const hello = deployHelloApp("azure", environment, k8sProvider, [cluster]);

export const helloNamespace = hello.namespace;
export const helloServiceName = hello.serviceName;
export const helloServiceIp = hello.serviceIp;
export const helloServiceHostname = hello.serviceHostname;

Validation is the same again:

pulumi -C infra/pulumi/azure stack output kubeconfig --show-secrets > kubeconfig.azure.yaml
KUBECONFIG=./kubeconfig.azure.yaml kubectl get nodes
KUBECONFIG=./kubeconfig.azure.yaml kubectl -n hello get deployment,service,pods

The stack outputs showed the Azure shape:

helloNamespace           hello
helloServiceIp           40.88.218.90
helloServiceName         trinity-dev-azure-hello-164d5ea9
name                     trinity-dev-azure-cluster
resourceGroupNameOutput  trinity-dev-azure-rg

The two AKS nodes came up ready:

NAME                                 STATUS   ROLES    AGE     VERSION
aks-systempool-19386533-vmss000000   Ready    <none>   6m39s   v1.33.10
aks-systempool-19386533-vmss000001   Ready    <none>   6m30s   v1.33.10

The hello namespace had the same shape as the other clouds: one deployment, one public load balancer service, and one running pod.

NAME                                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/trinity-dev-azure-hello-734d06b9   1/1     1            1           5m41s

NAME                                       TYPE           CLUSTER-IP   EXTERNAL-IP    PORT(S)        AGE
service/trinity-dev-azure-hello-164d5ea9   LoadBalancer   10.0.72.24   40.88.218.90   80:32744/TCP   5m37s

NAME                                                   READY   STATUS    RESTARTS   AGE
pod/trinity-dev-azure-hello-734d06b9-6bc55c5b9-bgm28   1/1     Running   0          5m42s

And the Azure load balancer answered through the public IP:

curl -I "http://$(pulumi -C infra/pulumi/azure stack output helloServiceIp)"
HTTP/1.1 200 OK
Server: nginx/1.27.5
Content-Type: text/html
Content-Length: 615

At this point all three clouds answer the same basic question: can I create a managed Kubernetes cluster from code, authenticate to it, schedule a pod, and get traffic to it? Yes.

Exit #

Phase 1 is now complete. It proves the substrate:

  • Pulumi can create one managed cluster in each cloud.
  • The three stacks share naming and labels while preserving provider-specific details.
  • Each cluster has working kubeconfig access.
  • Each cluster can schedule the same Pulumi-managed nginx workload.
  • Each cloud can expose that workload through its own load balancer.

It is the first checkpoint in the larger platform series: the three Kubernetes foundations exist and can run the same public workload.

For the foundations phase, that means:

  • create cloud accounts, projects, and subscriptions
  • establish the Pulumi project structure
  • provision base Kubernetes clusters
  • deploy a simple hello-world app to each cluster

The final validation pass before teardown showed ready worker nodes in each cluster. AWS and Azure had two workers; the GKE node pool for this checkpoint had one worker:

cloud  nodes  version
aws    2      v1.33.11-eks-7fcd7ec
gcp    1      v1.33.10-gke.1176000
azure  2      v1.33.11

The next phase is GitOps. The Phase 1 application is already managed declaratively by Pulumi. Application and platform add-on delivery should move to a reconciler running in the clusters. That means choosing Argo CD or Flux, moving the hello workload out of Pulumi-managed Kubernetes resources, and giving each cloud an overlay or values file for the differences that should remain explicit.

Source code #

Reference implementation (opens in a new tab)