Production (AWS)
Status: AWS infrastructure is not yet provisioned. This documents the target Phase 2 architecture. Local development (Docker Compose + Tilt) is the active environment.
AWS Topology
Production runs entirely on AWS, with all Go microservices on Amazon EKS (Kubernetes) and all stateful infrastructure on AWS managed services.
| Component | AWS Service | Notes |
|---|---|---|
| Compute | Amazon EKS + Karpenter | Auto-scaling K8s; Karpenter for node provisioning including spot for non-PHI workloads |
| Message broker | Amazon MSK (Apache Kafka) | 3-broker cluster, min.insync.replicas=2 |
| Relational database | Aurora PostgreSQL Multi-AZ | One logical cluster, one database per service |
| Cache | ElastiCache (Valkey) | Session data, claims-status cache, Kafka idempotency keys |
| Search | Amazon OpenSearch Service | Provider directory, ICD-10/CPT lookup, log analytics |
| Object storage | Amazon S3 | EDI files, generated documents (EOBs, policy documents), model artefacts |
| Secrets | OpenBao (self-hosted on EKS) | Dynamic secrets, PKI, DB credential rotation (Vault-compatible OSS fork) |
| Service mesh | Istio + Envoy | mTLS between all services, traffic management, OTel sidecar injection |
| GitOps | Argo CD | Continuous deployment from Git; Helm chart updates trigger automatic rollouts |
| Container images | Wolfi (Chainguard) base images | Minimal, signed, CVE-free; SBOM generated on every build |
Kubernetes Namespace Structure
cluster/
├── services/ # All Go microservices (claims, eligibility, enrollment, billing, …)
├── infra/ # Kafka (MSK consumer groups), OpenBao, APISIX, etcd, Mirth Connect
├── monitoring/ # Prometheus, Grafana, Loki, Tempo, OTel Collectors, Grafana OnCall
└── security/ # Falco, OPA, Keycloak, Istio control planeEach service namespace resource has a dedicated ServiceAccount with least-privilege IAM roles (IRSA) for S3, SQS, and any AWS APIs it needs.
Live Subdomains
| Subdomain | Purpose |
|---|---|
api.olly.theflywheel.in | APISIX API gateway — public entry point for all clients |
auth.olly.theflywheel.in | Keycloak identity provider (OIDC/SAML) |
admin.olly.theflywheel.in | React Admin web console for internal operations |
grafana.olly.theflywheel.in | Grafana dashboards |
status.olly.theflywheel.in | Gatus status dashboard |
temporal.olly.theflywheel.in | Temporal workflow UI |
kafka.olly.theflywheel.in | Redpanda Console (Kafka topic browser) |
bao.olly.theflywheel.in | OpenBao (Vault) UI |
mirth.olly.theflywheel.in | Mirth Connect EDI admin |
mailpit.olly.theflywheel.in | Mailpit email catcher (non-prod only) |
api-docs.olly.theflywheel.in | Scalar API documentation |
docs.olly.theflywheel.in | This documentation site |
SSL is managed by cert-manager with Let's Encrypt certificates, auto-renewed.
Observability Stack
All observability components run inside the monitoring namespace on EKS.
| Component | Role |
|---|---|
| Prometheus | Scrapes metrics from all services and infrastructure; evaluates SLO/SLA rules |
| Grafana | Dashboards for service metrics, Kafka lag, claim throughput, billing reconciliation |
| Grafana Loki | Log aggregation; all service stdout/stderr shipped via Loki promtail sidecars |
| Grafana Tempo | Distributed trace storage; traces ingested from OTel Collectors |
| OpenTelemetry Collectors | Receive traces and metrics from services via OTLP (gRPC :4317, HTTP :4318); forward to Tempo and Prometheus |
| Grafana OnCall | On-call scheduling, escalation policies, alert routing; triggered by Prometheus alerting rules |
Every Go service is instrumented with the OTel SDK at startup (via github.com/olly/middleware). Traces, metrics, and logs are correlated by trace_id and service.name attributes. The Grafana Explore view links logs in Loki directly to traces in Tempo.
Secrets Management
OpenBao (a Vault-compatible open-source fork) provides:
- Dynamic database credentials: Each service receives a short-lived Postgres username/password from OpenBao at startup. Credentials rotate automatically; no long-lived passwords in environment variables.
- PKI: Internal TLS certificates for mTLS between services.
- KV secrets: External API keys (Twilio, SendGrid, etc.), Kafka SASL credentials.
Services authenticate to OpenBao using Kubernetes service account tokens (the Vault Kubernetes auth method). OpenBao unseals automatically using AWS KMS.
HIPAA Compliance Considerations
- All data at rest is encrypted with AWS KMS (AES-256).
- All data in transit uses TLS 1.3; Istio enforces mTLS between every pod.
- PHI access is logged to CloudTrail with 6-year retention forwarded to OpenSearch.
- Falco monitors Kubernetes runtime syscalls for threat detection.
- OPA enforces fine-grained RBAC/ABAC policies for every API request.
- AI inference (Phase 4) will run on dedicated GPU nodes in an isolated VPC with no external egress, ensuring PHI never leaves the network.
Infrastructure as Code
All AWS resources are provisioned with OpenTofu (open-source Terraform):
cd infra/terraform && tofu plan # Preview changes
cd infra/terraform && tofu apply # ApplyKubernetes resources are managed with Helm charts in infra/helm/ and deployed via Argo CD GitOps. Updating a chart value in Git triggers an automatic Argo CD sync.