Hearth is the infrastructure home for the letemcook ecosystem. Ported from coherence-mcp/infra: - Terraform modules (VPC, EKS, IAM, NLB, S3, storage) - Kubernetes manifests (Forgejo, ingress, cert-manager, karpenter) - Deployment scripts (phased rollout) Status: Not deployed. EKS cluster needs to be provisioned. Next steps: 1. Bootstrap terraform backend 2. Deploy phase 1 (foundation) 3. Deploy phase 2 (core services including Forgejo) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
269 lines
7.7 KiB
Markdown
269 lines
7.7 KiB
Markdown
# Foundation Infrastructure
|
|
|
|
RFC 0039: ADR-Compliant Foundation Infrastructure
|
|
|
|
## Overview
|
|
|
|
This directory contains Terraform modules and Kubernetes manifests for deploying
|
|
the Alignment foundation infrastructure on AWS EKS.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Internet
|
|
|
|
|
+---------+----------+
|
|
| Shared NLB |
|
|
| (~$16/mo) |
|
|
+--------------------+
|
|
| :53 DNS (PowerDNS)|
|
|
| :25 SMTP |
|
|
| :587 Submission |
|
|
| :993 IMAPS |
|
|
| :443 HTTPS |
|
|
+--------+-----------+
|
|
|
|
|
+--------------------+--------------------+
|
|
| | |
|
|
+-----+------+ +-----+------+ +------+-----+
|
|
| AZ-a | | AZ-b | | AZ-c |
|
|
+------------+ +------------+ +------------+
|
|
| | | | | |
|
|
| Karpenter | | Karpenter | | Karpenter |
|
|
| Spot Nodes | | Spot Nodes | | Spot Nodes |
|
|
| | | | | |
|
|
+------------+ +------------+ +------------+
|
|
| | | | | |
|
|
| CockroachDB| | CockroachDB| | CockroachDB|
|
|
| (m6i.large)| | (m6i.large)| | (m6i.large)|
|
|
| | | | | |
|
|
+------------+ +------------+ +------------+
|
|
```
|
|
|
|
## Cost Breakdown
|
|
|
|
| Component | Monthly Cost |
|
|
|-----------|--------------|
|
|
| EKS Control Plane | $73 |
|
|
| CockroachDB (3x m6i.large, 3yr) | $105 |
|
|
| NLB | $16 |
|
|
| EFS | $5 |
|
|
| S3 | $5 |
|
|
| Spot nodes (variable) | $0-50 |
|
|
| **Total** | **$204-254** |
|
|
|
|
## ADR Compliance
|
|
|
|
- **ADR 0003**: Self-hosted CockroachDB with FIPS 140-2
|
|
- **ADR 0004**: "Set It and Forget It" auto-scaling with Karpenter
|
|
- **ADR 0005**: Full-stack self-hosting (no SaaS dependencies)
|
|
|
|
## Prerequisites
|
|
|
|
1. AWS CLI configured with appropriate credentials
|
|
2. Terraform >= 1.6.0
|
|
3. kubectl
|
|
4. Helm 3.x
|
|
|
|
## Quick Start
|
|
|
|
### 1. Bootstrap Terraform Backend
|
|
|
|
First, create the S3 bucket and DynamoDB table for Terraform state:
|
|
|
|
```bash
|
|
cd terraform/environments/production
|
|
# Uncomment the backend.tf bootstrap code and run:
|
|
# terraform init && terraform apply
|
|
```
|
|
|
|
### 2. Deploy Foundation Infrastructure
|
|
|
|
```bash
|
|
cd terraform/environments/production
|
|
terraform init
|
|
terraform plan
|
|
terraform apply
|
|
```
|
|
|
|
### 3. Configure kubectl
|
|
|
|
```bash
|
|
aws eks update-kubeconfig --region us-east-1 --name alignment-production
|
|
```
|
|
|
|
### 4. Deploy Karpenter
|
|
|
|
```bash
|
|
# Set environment variables
|
|
export CLUSTER_NAME=$(terraform output -raw cluster_name)
|
|
export CLUSTER_ENDPOINT=$(terraform output -raw cluster_endpoint)
|
|
export KARPENTER_ROLE_ARN=$(terraform output -raw karpenter_role_arn)
|
|
export INTERRUPTION_QUEUE_NAME=$(terraform output -raw karpenter_interruption_queue_name)
|
|
|
|
# Install Karpenter
|
|
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
|
|
--namespace karpenter --create-namespace \
|
|
-f kubernetes/karpenter/helm-values.yaml \
|
|
--set settings.clusterName=$CLUSTER_NAME \
|
|
--set settings.clusterEndpoint=$CLUSTER_ENDPOINT \
|
|
--set settings.interruptionQueue=$INTERRUPTION_QUEUE_NAME \
|
|
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=$KARPENTER_ROLE_ARN
|
|
|
|
# Apply NodePool and EC2NodeClass
|
|
kubectl apply -f kubernetes/karpenter/nodepool.yaml
|
|
kubectl apply -f kubernetes/karpenter/ec2nodeclass.yaml
|
|
```
|
|
|
|
### 5. Deploy Storage Classes
|
|
|
|
```bash
|
|
export EFS_ID=$(terraform output -raw efs_id)
|
|
envsubst < kubernetes/storage/classes.yaml | kubectl apply -f -
|
|
```
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
infra/
|
|
├── terraform/
|
|
│ ├── main.tf # Root module
|
|
│ ├── variables.tf # Input variables
|
|
│ ├── outputs.tf # Output values
|
|
│ ├── versions.tf # Provider versions
|
|
│ ├── modules/
|
|
│ │ ├── vpc/ # VPC with multi-AZ subnets
|
|
│ │ ├── eks/ # EKS cluster with Fargate
|
|
│ │ ├── iam/ # IAM roles and IRSA
|
|
│ │ ├── storage/ # EFS and S3
|
|
│ │ ├── nlb/ # Shared NLB
|
|
│ │ └── cockroachdb/ # CockroachDB (future)
|
|
│ └── environments/
|
|
│ └── production/ # Production config
|
|
├── kubernetes/
|
|
│ ├── karpenter/ # Karpenter manifests
|
|
│ ├── cockroachdb/ # CockroachDB StatefulSet
|
|
│ ├── storage/ # Storage classes
|
|
│ ├── ingress/ # Ingress configuration
|
|
│ └── cert-manager/ # TLS certificates
|
|
└── README.md
|
|
```
|
|
|
|
## Modules
|
|
|
|
### VPC Module
|
|
|
|
Creates a VPC with:
|
|
- 3 availability zones
|
|
- Public subnets (for NLB, NAT Gateways)
|
|
- Private subnets (for EKS nodes, workloads)
|
|
- Database subnets (isolated, for CockroachDB)
|
|
- NAT Gateway per AZ for HA
|
|
- VPC endpoints for S3, ECR, STS, EC2
|
|
|
|
### EKS Module
|
|
|
|
Creates an EKS cluster with:
|
|
- Kubernetes 1.29
|
|
- Fargate profiles for Karpenter and kube-system
|
|
- OIDC provider for IRSA
|
|
- KMS encryption for secrets
|
|
- Cluster logging enabled
|
|
|
|
### IAM Module
|
|
|
|
Creates IAM roles for:
|
|
- Karpenter controller
|
|
- EBS CSI driver
|
|
- EFS CSI driver
|
|
- AWS Load Balancer Controller
|
|
- cert-manager
|
|
- External DNS
|
|
|
|
### Storage Module
|
|
|
|
Creates storage resources:
|
|
- EFS filesystem with encryption
|
|
- S3 bucket for backups (versioned, encrypted)
|
|
- S3 bucket for blob storage
|
|
- KMS key for encryption
|
|
|
|
### NLB Module
|
|
|
|
Creates a shared NLB with:
|
|
- HTTPS (443) for web traffic
|
|
- DNS (53 UDP/TCP) for PowerDNS
|
|
- SMTP (25), Submission (587), IMAPS (993) for email
|
|
- Cross-zone load balancing
|
|
- Target groups for each service
|
|
|
|
## Operations
|
|
|
|
### Scaling
|
|
|
|
Karpenter automatically scales nodes based on pending pods. No manual intervention required.
|
|
|
|
To adjust limits:
|
|
```bash
|
|
kubectl edit nodepool default
|
|
```
|
|
|
|
### Monitoring
|
|
|
|
Check Karpenter status:
|
|
```bash
|
|
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f
|
|
```
|
|
|
|
Check node status:
|
|
```bash
|
|
kubectl get nodes -L karpenter.sh/capacity-type,node.kubernetes.io/instance-type
|
|
```
|
|
|
|
### Troubleshooting
|
|
|
|
View Karpenter events:
|
|
```bash
|
|
kubectl get events -n karpenter --sort-by=.lastTimestamp
|
|
```
|
|
|
|
Check pending pods:
|
|
```bash
|
|
kubectl get pods --all-namespaces --field-selector=status.phase=Pending
|
|
```
|
|
|
|
## Security
|
|
|
|
- All storage encrypted at rest (KMS)
|
|
- TLS required for all connections
|
|
- IMDSv2 required for all nodes
|
|
- VPC Flow Logs enabled
|
|
- Cluster audit logging enabled
|
|
- FIPS 140-2 mode for CockroachDB
|
|
|
|
## Disaster Recovery
|
|
|
|
### Backups
|
|
|
|
CockroachDB backups are stored in S3 with:
|
|
- Daily full backups
|
|
- 30-day retention in Standard
|
|
- 90-day transition to Glacier
|
|
- 365-day noncurrent version retention
|
|
|
|
### Recovery
|
|
|
|
To restore from backup:
|
|
```bash
|
|
# Restore CockroachDB from S3 backup
|
|
cockroach restore ... FROM 's3://alignment-production-backups/...'
|
|
```
|
|
|
|
## References
|
|
|
|
- [RFC 0039: Foundation Infrastructure](../../../.repos/alignment-mcp/docs/rfcs/0039-foundation-infrastructure.md)
|
|
- [ADR 0003: CockroachDB Self-Hosted FIPS](../../../.repos/alignment-mcp/docs/adrs/0003-cockroachdb-self-hosted-fips.md)
|
|
- [ADR 0004: Set It and Forget It](../../../.repos/alignment-mcp/docs/adrs/0004-set-it-and-forget-it-architecture.md)
|
|
- [ADR 0005: Full-Stack Self-Hosting](../../../.repos/alignment-mcp/docs/adrs/0005-full-stack-self-hosting.md)
|
|
- [Karpenter Documentation](https://karpenter.sh/)
|
|
- [EKS Best Practices](https://aws.github.io/aws-eks-best-practices/)
|