Hearth is the infrastructure home for the letemcook ecosystem. Ported from coherence-mcp/infra: - Terraform modules (VPC, EKS, IAM, NLB, S3, storage) - Kubernetes manifests (Forgejo, ingress, cert-manager, karpenter) - Deployment scripts (phased rollout) Status: Not deployed. EKS cluster needs to be provisioned. Next steps: 1. Bootstrap terraform backend 2. Deploy phase 1 (foundation) 3. Deploy phase 2 (core services including Forgejo) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
7.7 KiB
7.7 KiB
Foundation Infrastructure
RFC 0039: ADR-Compliant Foundation Infrastructure
Overview
This directory contains Terraform modules and Kubernetes manifests for deploying the Alignment foundation infrastructure on AWS EKS.
Architecture
Internet
|
+---------+----------+
| Shared NLB |
| (~$16/mo) |
+--------------------+
| :53 DNS (PowerDNS)|
| :25 SMTP |
| :587 Submission |
| :993 IMAPS |
| :443 HTTPS |
+--------+-----------+
|
+--------------------+--------------------+
| | |
+-----+------+ +-----+------+ +------+-----+
| AZ-a | | AZ-b | | AZ-c |
+------------+ +------------+ +------------+
| | | | | |
| Karpenter | | Karpenter | | Karpenter |
| Spot Nodes | | Spot Nodes | | Spot Nodes |
| | | | | |
+------------+ +------------+ +------------+
| | | | | |
| CockroachDB| | CockroachDB| | CockroachDB|
| (m6i.large)| | (m6i.large)| | (m6i.large)|
| | | | | |
+------------+ +------------+ +------------+
Cost Breakdown
| Component | Monthly Cost |
|---|---|
| EKS Control Plane | $73 |
| CockroachDB (3x m6i.large, 3yr) | $105 |
| NLB | $16 |
| EFS | $5 |
| S3 | $5 |
| Spot nodes (variable) | $0-50 |
| Total | $204-254 |
ADR Compliance
- ADR 0003: Self-hosted CockroachDB with FIPS 140-2
- ADR 0004: "Set It and Forget It" auto-scaling with Karpenter
- ADR 0005: Full-stack self-hosting (no SaaS dependencies)
Prerequisites
- AWS CLI configured with appropriate credentials
- Terraform >= 1.6.0
- kubectl
- Helm 3.x
Quick Start
1. Bootstrap Terraform Backend
First, create the S3 bucket and DynamoDB table for Terraform state:
cd terraform/environments/production
# Uncomment the backend.tf bootstrap code and run:
# terraform init && terraform apply
2. Deploy Foundation Infrastructure
cd terraform/environments/production
terraform init
terraform plan
terraform apply
3. Configure kubectl
aws eks update-kubeconfig --region us-east-1 --name alignment-production
4. Deploy Karpenter
# Set environment variables
export CLUSTER_NAME=$(terraform output -raw cluster_name)
export CLUSTER_ENDPOINT=$(terraform output -raw cluster_endpoint)
export KARPENTER_ROLE_ARN=$(terraform output -raw karpenter_role_arn)
export INTERRUPTION_QUEUE_NAME=$(terraform output -raw karpenter_interruption_queue_name)
# Install Karpenter
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
--namespace karpenter --create-namespace \
-f kubernetes/karpenter/helm-values.yaml \
--set settings.clusterName=$CLUSTER_NAME \
--set settings.clusterEndpoint=$CLUSTER_ENDPOINT \
--set settings.interruptionQueue=$INTERRUPTION_QUEUE_NAME \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=$KARPENTER_ROLE_ARN
# Apply NodePool and EC2NodeClass
kubectl apply -f kubernetes/karpenter/nodepool.yaml
kubectl apply -f kubernetes/karpenter/ec2nodeclass.yaml
5. Deploy Storage Classes
export EFS_ID=$(terraform output -raw efs_id)
envsubst < kubernetes/storage/classes.yaml | kubectl apply -f -
Directory Structure
infra/
├── terraform/
│ ├── main.tf # Root module
│ ├── variables.tf # Input variables
│ ├── outputs.tf # Output values
│ ├── versions.tf # Provider versions
│ ├── modules/
│ │ ├── vpc/ # VPC with multi-AZ subnets
│ │ ├── eks/ # EKS cluster with Fargate
│ │ ├── iam/ # IAM roles and IRSA
│ │ ├── storage/ # EFS and S3
│ │ ├── nlb/ # Shared NLB
│ │ └── cockroachdb/ # CockroachDB (future)
│ └── environments/
│ └── production/ # Production config
├── kubernetes/
│ ├── karpenter/ # Karpenter manifests
│ ├── cockroachdb/ # CockroachDB StatefulSet
│ ├── storage/ # Storage classes
│ ├── ingress/ # Ingress configuration
│ └── cert-manager/ # TLS certificates
└── README.md
Modules
VPC Module
Creates a VPC with:
- 3 availability zones
- Public subnets (for NLB, NAT Gateways)
- Private subnets (for EKS nodes, workloads)
- Database subnets (isolated, for CockroachDB)
- NAT Gateway per AZ for HA
- VPC endpoints for S3, ECR, STS, EC2
EKS Module
Creates an EKS cluster with:
- Kubernetes 1.29
- Fargate profiles for Karpenter and kube-system
- OIDC provider for IRSA
- KMS encryption for secrets
- Cluster logging enabled
IAM Module
Creates IAM roles for:
- Karpenter controller
- EBS CSI driver
- EFS CSI driver
- AWS Load Balancer Controller
- cert-manager
- External DNS
Storage Module
Creates storage resources:
- EFS filesystem with encryption
- S3 bucket for backups (versioned, encrypted)
- S3 bucket for blob storage
- KMS key for encryption
NLB Module
Creates a shared NLB with:
- HTTPS (443) for web traffic
- DNS (53 UDP/TCP) for PowerDNS
- SMTP (25), Submission (587), IMAPS (993) for email
- Cross-zone load balancing
- Target groups for each service
Operations
Scaling
Karpenter automatically scales nodes based on pending pods. No manual intervention required.
To adjust limits:
kubectl edit nodepool default
Monitoring
Check Karpenter status:
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f
Check node status:
kubectl get nodes -L karpenter.sh/capacity-type,node.kubernetes.io/instance-type
Troubleshooting
View Karpenter events:
kubectl get events -n karpenter --sort-by=.lastTimestamp
Check pending pods:
kubectl get pods --all-namespaces --field-selector=status.phase=Pending
Security
- All storage encrypted at rest (KMS)
- TLS required for all connections
- IMDSv2 required for all nodes
- VPC Flow Logs enabled
- Cluster audit logging enabled
- FIPS 140-2 mode for CockroachDB
Disaster Recovery
Backups
CockroachDB backups are stored in S3 with:
- Daily full backups
- 30-day retention in Standard
- 90-day transition to Glacier
- 365-day noncurrent version retention
Recovery
To restore from backup:
# Restore CockroachDB from S3 backup
cockroach restore ... FROM 's3://alignment-production-backups/...'