# Foundation Infrastructure RFC 0039: ADR-Compliant Foundation Infrastructure ## Overview This directory contains Terraform modules and Kubernetes manifests for deploying the Alignment foundation infrastructure on AWS EKS. ## Architecture ``` Internet | +---------+----------+ | Shared NLB | | (~$16/mo) | +--------------------+ | :53 DNS (PowerDNS)| | :25 SMTP | | :587 Submission | | :993 IMAPS | | :443 HTTPS | +--------+-----------+ | +--------------------+--------------------+ | | | +-----+------+ +-----+------+ +------+-----+ | AZ-a | | AZ-b | | AZ-c | +------------+ +------------+ +------------+ | | | | | | | Karpenter | | Karpenter | | Karpenter | | Spot Nodes | | Spot Nodes | | Spot Nodes | | | | | | | +------------+ +------------+ +------------+ | | | | | | | CockroachDB| | CockroachDB| | CockroachDB| | (m6i.large)| | (m6i.large)| | (m6i.large)| | | | | | | +------------+ +------------+ +------------+ ``` ## Cost Breakdown | Component | Monthly Cost | |-----------|--------------| | EKS Control Plane | $73 | | CockroachDB (3x m6i.large, 3yr) | $105 | | NLB | $16 | | EFS | $5 | | S3 | $5 | | Spot nodes (variable) | $0-50 | | **Total** | **$204-254** | ## ADR Compliance - **ADR 0003**: Self-hosted CockroachDB with FIPS 140-2 - **ADR 0004**: "Set It and Forget It" auto-scaling with Karpenter - **ADR 0005**: Full-stack self-hosting (no SaaS dependencies) ## Prerequisites 1. AWS CLI configured with appropriate credentials 2. Terraform >= 1.6.0 3. kubectl 4. Helm 3.x ## Quick Start ### 1. Bootstrap Terraform Backend First, create the S3 bucket and DynamoDB table for Terraform state: ```bash cd terraform/environments/production # Uncomment the backend.tf bootstrap code and run: # terraform init && terraform apply ``` ### 2. Deploy Foundation Infrastructure ```bash cd terraform/environments/production terraform init terraform plan terraform apply ``` ### 3. Configure kubectl ```bash aws eks update-kubeconfig --region us-east-1 --name alignment-production ``` ### 4. Deploy Karpenter ```bash # Set environment variables export CLUSTER_NAME=$(terraform output -raw cluster_name) export CLUSTER_ENDPOINT=$(terraform output -raw cluster_endpoint) export KARPENTER_ROLE_ARN=$(terraform output -raw karpenter_role_arn) export INTERRUPTION_QUEUE_NAME=$(terraform output -raw karpenter_interruption_queue_name) # Install Karpenter helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \ --namespace karpenter --create-namespace \ -f kubernetes/karpenter/helm-values.yaml \ --set settings.clusterName=$CLUSTER_NAME \ --set settings.clusterEndpoint=$CLUSTER_ENDPOINT \ --set settings.interruptionQueue=$INTERRUPTION_QUEUE_NAME \ --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=$KARPENTER_ROLE_ARN # Apply NodePool and EC2NodeClass kubectl apply -f kubernetes/karpenter/nodepool.yaml kubectl apply -f kubernetes/karpenter/ec2nodeclass.yaml ``` ### 5. Deploy Storage Classes ```bash export EFS_ID=$(terraform output -raw efs_id) envsubst < kubernetes/storage/classes.yaml | kubectl apply -f - ``` ## Directory Structure ``` infra/ ├── terraform/ │ ├── main.tf # Root module │ ├── variables.tf # Input variables │ ├── outputs.tf # Output values │ ├── versions.tf # Provider versions │ ├── modules/ │ │ ├── vpc/ # VPC with multi-AZ subnets │ │ ├── eks/ # EKS cluster with Fargate │ │ ├── iam/ # IAM roles and IRSA │ │ ├── storage/ # EFS and S3 │ │ ├── nlb/ # Shared NLB │ │ └── cockroachdb/ # CockroachDB (future) │ └── environments/ │ └── production/ # Production config ├── kubernetes/ │ ├── karpenter/ # Karpenter manifests │ ├── cockroachdb/ # CockroachDB StatefulSet │ ├── storage/ # Storage classes │ ├── ingress/ # Ingress configuration │ └── cert-manager/ # TLS certificates └── README.md ``` ## Modules ### VPC Module Creates a VPC with: - 3 availability zones - Public subnets (for NLB, NAT Gateways) - Private subnets (for EKS nodes, workloads) - Database subnets (isolated, for CockroachDB) - NAT Gateway per AZ for HA - VPC endpoints for S3, ECR, STS, EC2 ### EKS Module Creates an EKS cluster with: - Kubernetes 1.29 - Fargate profiles for Karpenter and kube-system - OIDC provider for IRSA - KMS encryption for secrets - Cluster logging enabled ### IAM Module Creates IAM roles for: - Karpenter controller - EBS CSI driver - EFS CSI driver - AWS Load Balancer Controller - cert-manager - External DNS ### Storage Module Creates storage resources: - EFS filesystem with encryption - S3 bucket for backups (versioned, encrypted) - S3 bucket for blob storage - KMS key for encryption ### NLB Module Creates a shared NLB with: - HTTPS (443) for web traffic - DNS (53 UDP/TCP) for PowerDNS - SMTP (25), Submission (587), IMAPS (993) for email - Cross-zone load balancing - Target groups for each service ## Operations ### Scaling Karpenter automatically scales nodes based on pending pods. No manual intervention required. To adjust limits: ```bash kubectl edit nodepool default ``` ### Monitoring Check Karpenter status: ```bash kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f ``` Check node status: ```bash kubectl get nodes -L karpenter.sh/capacity-type,node.kubernetes.io/instance-type ``` ### Troubleshooting View Karpenter events: ```bash kubectl get events -n karpenter --sort-by=.lastTimestamp ``` Check pending pods: ```bash kubectl get pods --all-namespaces --field-selector=status.phase=Pending ``` ## Security - All storage encrypted at rest (KMS) - TLS required for all connections - IMDSv2 required for all nodes - VPC Flow Logs enabled - Cluster audit logging enabled - FIPS 140-2 mode for CockroachDB ## Disaster Recovery ### Backups CockroachDB backups are stored in S3 with: - Daily full backups - 30-day retention in Standard - 90-day transition to Glacier - 365-day noncurrent version retention ### Recovery To restore from backup: ```bash # Restore CockroachDB from S3 backup cockroach restore ... FROM 's3://alignment-production-backups/...' ``` ## References - [RFC 0039: Foundation Infrastructure](../../../.repos/alignment-mcp/docs/rfcs/0039-foundation-infrastructure.md) - [ADR 0003: CockroachDB Self-Hosted FIPS](../../../.repos/alignment-mcp/docs/adrs/0003-cockroachdb-self-hosted-fips.md) - [ADR 0004: Set It and Forget It](../../../.repos/alignment-mcp/docs/adrs/0004-set-it-and-forget-it-architecture.md) - [ADR 0005: Full-Stack Self-Hosting](../../../.repos/alignment-mcp/docs/adrs/0005-full-stack-self-hosting.md) - [Karpenter Documentation](https://karpenter.sh/) - [EKS Best Practices](https://aws.github.io/aws-eks-best-practices/)