Skip to content

Production (AWS)

Status: AWS infrastructure is not yet provisioned. This documents the target Phase 2 architecture. Local development (Docker Compose + Tilt) is the active environment.

AWS Topology

Production runs entirely on AWS, with all Go microservices on Amazon EKS (Kubernetes) and all stateful infrastructure on AWS managed services.

ComponentAWS ServiceNotes
ComputeAmazon EKS + KarpenterAuto-scaling K8s; Karpenter for node provisioning including spot for non-PHI workloads
Message brokerAmazon MSK (Apache Kafka)3-broker cluster, min.insync.replicas=2
Relational databaseAurora PostgreSQL Multi-AZOne logical cluster, one database per service
CacheElastiCache (Valkey)Session data, claims-status cache, Kafka idempotency keys
SearchAmazon OpenSearch ServiceProvider directory, ICD-10/CPT lookup, log analytics
Object storageAmazon S3EDI files, generated documents (EOBs, policy documents), model artefacts
SecretsOpenBao (self-hosted on EKS)Dynamic secrets, PKI, DB credential rotation (Vault-compatible OSS fork)
Service meshIstio + EnvoymTLS between all services, traffic management, OTel sidecar injection
GitOpsArgo CDContinuous deployment from Git; Helm chart updates trigger automatic rollouts
Container imagesWolfi (Chainguard) base imagesMinimal, signed, CVE-free; SBOM generated on every build

Kubernetes Namespace Structure

cluster/
├── services/     # All Go microservices (claims, eligibility, enrollment, billing, …)
├── infra/        # Kafka (MSK consumer groups), OpenBao, APISIX, etcd, Mirth Connect
├── monitoring/   # Prometheus, Grafana, Loki, Tempo, OTel Collectors, Grafana OnCall
└── security/     # Falco, OPA, Keycloak, Istio control plane

Each service namespace resource has a dedicated ServiceAccount with least-privilege IAM roles (IRSA) for S3, SQS, and any AWS APIs it needs.

Live Subdomains

SubdomainPurpose
api.olly.theflywheel.inAPISIX API gateway — public entry point for all clients
auth.olly.theflywheel.inKeycloak identity provider (OIDC/SAML)
admin.olly.theflywheel.inReact Admin web console for internal operations
grafana.olly.theflywheel.inGrafana dashboards
status.olly.theflywheel.inGatus status dashboard
temporal.olly.theflywheel.inTemporal workflow UI
kafka.olly.theflywheel.inRedpanda Console (Kafka topic browser)
bao.olly.theflywheel.inOpenBao (Vault) UI
mirth.olly.theflywheel.inMirth Connect EDI admin
mailpit.olly.theflywheel.inMailpit email catcher (non-prod only)
api-docs.olly.theflywheel.inScalar API documentation
docs.olly.theflywheel.inThis documentation site

SSL is managed by cert-manager with Let's Encrypt certificates, auto-renewed.

Observability Stack

All observability components run inside the monitoring namespace on EKS.

ComponentRole
PrometheusScrapes metrics from all services and infrastructure; evaluates SLO/SLA rules
GrafanaDashboards for service metrics, Kafka lag, claim throughput, billing reconciliation
Grafana LokiLog aggregation; all service stdout/stderr shipped via Loki promtail sidecars
Grafana TempoDistributed trace storage; traces ingested from OTel Collectors
OpenTelemetry CollectorsReceive traces and metrics from services via OTLP (gRPC :4317, HTTP :4318); forward to Tempo and Prometheus
Grafana OnCallOn-call scheduling, escalation policies, alert routing; triggered by Prometheus alerting rules

Every Go service is instrumented with the OTel SDK at startup (via github.com/olly/middleware). Traces, metrics, and logs are correlated by trace_id and service.name attributes. The Grafana Explore view links logs in Loki directly to traces in Tempo.

Secrets Management

OpenBao (a Vault-compatible open-source fork) provides:

  • Dynamic database credentials: Each service receives a short-lived Postgres username/password from OpenBao at startup. Credentials rotate automatically; no long-lived passwords in environment variables.
  • PKI: Internal TLS certificates for mTLS between services.
  • KV secrets: External API keys (Twilio, SendGrid, etc.), Kafka SASL credentials.

Services authenticate to OpenBao using Kubernetes service account tokens (the Vault Kubernetes auth method). OpenBao unseals automatically using AWS KMS.

HIPAA Compliance Considerations

  • All data at rest is encrypted with AWS KMS (AES-256).
  • All data in transit uses TLS 1.3; Istio enforces mTLS between every pod.
  • PHI access is logged to CloudTrail with 6-year retention forwarded to OpenSearch.
  • Falco monitors Kubernetes runtime syscalls for threat detection.
  • OPA enforces fine-grained RBAC/ABAC policies for every API request.
  • AI inference (Phase 4) will run on dedicated GPU nodes in an isolated VPC with no external egress, ensuring PHI never leaves the network.

Infrastructure as Code

All AWS resources are provisioned with OpenTofu (open-source Terraform):

bash
cd infra/terraform && tofu plan   # Preview changes
cd infra/terraform && tofu apply  # Apply

Kubernetes resources are managed with Helm charts in infra/helm/ and deployed via Argo CD GitOps. Updating a chart value in Git triggers an automatic Argo CD sync.

Olly Health Insurance Platform