Traffic

Publish at:

Traffic routing across clouds

Traffic checkpoint #

The next phase is to stop treating those three load balancers as separate user entry points.

The traffic layer is Azure Front Door using its default hostname:

browser
  -> https://<front-door-endpoint>.azurefd.net
  -> Azure Front Door
  -> HTTP LoadBalancer origins in AWS, GCP, and Azure

That gives the platform a single HTTPS public entry point while keeping the setup focused on the cloud providers themselves. Front Door terminates HTTPS at the edge, redirects HTTP to HTTPS, probes each origin through /readyz, and forwards traffic to the Pulumi-managed Kubernetes LoadBalancer services.

Ownership boundary #

This phase changes the ownership boundary again. Argo CD still owns the Mandelbrot deployment and source ConfigMap, but the public service moves to Pulumi because it now needs cloud-specific static address bindings:

Argo CD:
  - Mandelbrot deployment
  - application source ConfigMap
  - cloud-specific app overlays

Pulumi cluster stacks:
  - Mandelbrot namespace
  - static cloud origin addresses
  - Mandelbrot LoadBalancer service

Pulumi traffic stack:
  - Azure Front Door
  - stage URL ConfigMap patches

The shared Pulumi component creates only the namespace and service:

Origin discovery #

The Front Door origins come from Pulumi stack outputs. Each cluster stack exports its Mandelbrot origin address:

aws   -> mandelbrotOriginHosts
gcp   -> mandelbrotOriginHost
azure -> mandelbrotOriginHost

AWS is the one slight exception to the one-cloud-one-origin summary. The AWS service is backed by an internet-facing NLB with static Elastic IPs, so the AWS stack exports both EIPs. Front Door gets two AWS origins, aws and aws-2, while the application's own AWS stage URL keeps using the primary address. That keeps the cross-cluster render configuration simple while avoiding a single AWS origin address in the global entry point.

The separate Pulumi traffic stack reads those outputs through stack references. The cluster stacks still own EKS, GKE, AKS, Argo CD, the static origin addresses, and the Mandelbrot Kubernetes service. The traffic stack owns the global entry point and the dynamic mandelbrot-stage-urls ConfigMap in each cluster. Keeping those concerns separate matters because the traffic layer should be updatable without rebuilding the clusters or committing generated origin addresses back to Git.

The traffic stack starts by reading the three cluster stacks:

AWS exports two Elastic IPs for the Front Door origin group:

Front Door route #

Front Door itself is a profile, endpoint, origin group, origins, and one route:

Deployment propagation #

The live deployment exposed a useful Azure-specific behavior. Immediately after the traffic stack finished, all three direct origins were healthy:

curl "http://<aws-origin>/readyz"
curl "http://<gcp-origin>/readyz"
curl "http://<azure-origin>/readyz"

But the Front Door hostname returned Azure's own Page not found response:

Oops! We weren't able to find your Azure Front Door Service configuration.

The important detail was the Front Door resource state. The profile, endpoint, origin group, origins, and route all had provisioningState: Succeeded, but the endpoint and route still had deploymentStatus: NotStarted. The control plane had accepted the configuration; the global data plane had not activated it yet.

That is normal enough to document. Azure's own Front Door FAQ[a] says configuration propagation can take up to about 20 minutes for a single create or update, and back-to-back operations can stretch that to roughly 40 minutes because updates are queued. The right troubleshooting order:

az afd endpoint show \
  --resource-group trinity-dev-azure-traffic-rg \
  --profile-name trinity-dev-azure-frontdoor \
  --endpoint-name trinity-dev-azure-mandelbrot \
  --query "{provisioningState:provisioningState,deploymentStatus:deploymentStatus,hostName:hostName}" \
  -o json

If the direct origins pass /readyz and Front Door is still NotStarted, wait before changing the infrastructure. Once the deployment propagated, the default azurefd.net hostname served the Mandelbrot UI as expected.

And the current endpoint is:

https://trinity-dev-azure-mandelbrot-b6bxhudjc9azgeay.z02.azurefd.net

The health check through Front Door now returns a normal application response:

{"ok":true,"cloud":"aws","region":"us-east-1"}

And /api/meta shows the live workload identity behind the global entry point:

{"cloud":"aws","region":"us-east-1","pod":"mandelbrot-69d45cf95f-khkkh","route":["aws","gcp","azure"]}

Failover drill #

The repeatable failover drill is controlled from the Front Door side. Disable one origin, or both AWS origins, poll /api/meta through the public endpoint until traffic stops landing on that cloud, then re-enable the origin. That tests the global entry point and rollback path without committing a broken Kubernetes desired state just to make an origin unhealthy.

The branch includes a small script for that drill:

npm run test:traffic-failover -- --cloud aws
npm run test:traffic-failover -- --cloud gcp
npm run test:traffic-failover -- --cloud azure

Stage URL distribution #

That same traffic stack also makes the renderer a real cross-cluster demo. It writes AWS_STAGE_URL, GCP_STAGE_URL, and AZURE_STAGE_URL into mandelbrot-stage-urls in each cluster, using the same Pulumi stack outputs that feed Front Door. The app reads that ConfigMap through an optional mounted volume at request time, so the bootstrap fallback disappears once the traffic stack has applied.

The traffic stack applies the stage URL ConfigMap patch in each cluster:

The deployment mounts that ConfigMap as an optional volume:

env:
  - name: STAGE_ROUTE
    value: aws,gcp,azure
  - name: STAGE_URL_CONFIG_DIR
    value: /config/stage-urls
volumeMounts:
  - name: stage-urls
    mountPath: /config/stage-urls
    readOnly: true
volumes:
  - name: stage-urls
    configMap:
      name: mandelbrot-stage-urls
      optional: true

Cross-cluster rendering #

The render path now fans out concurrently. The service splits one image into horizontal tiles, sends one stage request to each cloud at the same time, waits for the stage responses, then returns the ordered tiles to the browser. In strict mode, one failed stage fails the render. In degraded mode, successful tiles can still be shown with the failed stage called out in the UI.

The proof is in the render response. A request through Front Door reports both the requested stage and the cloud that actually rendered it:

requested  rendered  region
aws        aws       us-east-1
gcp        gcp       us-central1
azure      azure     eastus

The browser view shows the same thing during a continuous render. In one live frame, the request landed on GCP, while the three stage cards showed each cloud rendering its assigned horizontal tile:

app cloud  app region
gcp        us-central1

stage  rendered by  region       rows     duration
AWS    AWS          us-east-1    0-100    25 ms
GCP    GCP          us-central1  100-200  41 ms
AZURE  AZURE        eastus       200-300  16 ms

That is the checkpoint I wanted from this phase. The browser enters through one HTTPS endpoint, but a single Mandelbrot image is split into tiles rendered by three independent Kubernetes clusters.

Exit #

The traffic phase is complete. It proves a single public entry point, health-aware routing to cloud origins, dynamic cross-cluster stage configuration, and concurrent work across EKS, GKE, and AKS. It does not pretend to solve stateful active-active traffic. The Mandelbrot service is stateless, which is exactly why this checkpoint can stay clean.

Source code #

Reference implementation (opens in a new tab)

Notes

  1. Azure Front Door FAQ, deployment timing: Microsoft Learn (opens in a new tab) · Back