From f23ea198f0d560d198b4f919753d7584f28e242a Mon Sep 17 00:00:00 2001 From: Eric Garcia Date: Sat, 24 Jan 2026 09:03:51 -0500 Subject: [PATCH] docs: Update documentation for minimal k3s architecture Reflect current state: - k3s on single EC2 spot instance (~$7.50/month) - Forgejo, PowerDNS, Traefik running - Remove outdated EKS/CockroachDB references Co-Authored-By: Claude Opus 4.5 --- CLAUDE.md | 71 +++--- README.md | 51 +++-- docs/architecture.md | 397 ++++++++++++--------------------- terraform/minimal/user-data.sh | 7 +- 4 files changed, 221 insertions(+), 305 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index dc94dba..8c16aa5 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,28 +4,29 @@ The warm center where infrastructure becomes real. ## What This Is -Hearth is the infrastructure repository for the letemcook ecosystem. It contains: +Hearth is the infrastructure repository for the letemcook ecosystem. It runs a minimal k3s setup on a single EC2 spot instance (~$7.50/month). -- **Terraform modules** for AWS EKS, VPC, IAM, storage -- **Kubernetes manifests** for core services (Forgejo, cert-manager, ingress) -- **Deployment scripts** for phased rollout +Services: +- **Forgejo** - Self-hosted Git +- **PowerDNS** - Authoritative DNS +- **Traefik** - Ingress with Let's Encrypt ## Quick Start ```bash # 1. Configure AWS -aws sso login --profile muffinlabs +aws sso login --profile hearth -# 2. Bootstrap Terraform backend -cd terraform/environments/production +# 2. Deploy infrastructure +cd terraform/minimal terraform init -terraform apply -target=module.bootstrap +terraform apply -# 3. Deploy foundation (EKS, VPC, storage) -./scripts/deploy-phase1-foundation.sh +# 3. Deploy PowerDNS (after instance is running) +scp -P 2222 scripts/deploy-powerdns.sh ec2-user@: +ssh -p 2222 ec2-user@ 'sudo bash deploy-powerdns.sh ' -# 4. Deploy core services (Forgejo) -./scripts/deploy-phase2-core-services.sh +# 4. Update GoDaddy glue records for each domain ``` ## Structure @@ -33,26 +34,28 @@ terraform apply -target=module.bootstrap ``` hearth/ ├── terraform/ -│ ├── modules/ # Reusable infrastructure modules -│ │ ├── vpc/ # VPC with multi-AZ subnets -│ │ ├── eks/ # EKS cluster -│ │ ├── iam/ # IAM roles and IRSA -│ │ ├── nlb/ # Network Load Balancer -│ │ └── storage/ # EFS, S3 -│ ├── main.tf # Root module -│ ├── variables.tf # Input variables -│ └── outputs.tf # Output values -├── kubernetes/ -│ ├── forgejo/ # Git hosting -│ ├── ingress/ # ALB ingress -│ ├── cert-manager/ # TLS certificates -│ ├── karpenter/ # Auto-scaling -│ └── storage/ # Storage classes +│ └── minimal/ # Single EC2 + k3s +│ ├── main.tf # VPC, EC2, security groups +│ ├── variables.tf # Input variables +│ └── user-data.sh # k3s + Forgejo bootstrap ├── scripts/ -│ ├── deploy-phase*.sh # Phased deployment -│ └── validate-*.sh # Validation scripts +│ └── deploy-powerdns.sh # PowerDNS deployment └── docs/ - └── architecture.md # Infrastructure overview + ├── architecture.md # Infrastructure overview + └── rfcs/ # Design decisions +``` + +## Access + +```bash +# Admin SSH +ssh -p 2222 ec2-user@3.218.167.115 + +# kubectl (on server) +kubectl get pods -A + +# Forgejo +https://git.beyondtheuniverse.superviber.com ``` ## Principles @@ -62,17 +65,17 @@ From Blue's ADRs: - **Single Source (0005)**: Infrastructure as code, one truth - **Evidence (0004)**: Terraform plan before apply - **No Dead Code (0010)**: Delete unused resources -- **Never Give Up (0000)**: Deploy, fail, learn, redeploy +- **Freedom Through Constraint (0011)**: Minimal viable infrastructure ## AWS Profile -Use `muffinlabs` profile for all AWS operations: +Use `hearth` profile for all AWS operations: ```bash -export AWS_PROFILE=muffinlabs +export AWS_PROFILE=hearth ``` ## Related Repos - **blue** - Philosophy and CLI tooling -- **coherence-mcp** - MCP server (source of these manifests) +- **coherence-mcp** - MCP server (original source) diff --git a/README.md b/README.md index a727031..7e80fad 100644 --- a/README.md +++ b/README.md @@ -1,37 +1,54 @@ # Hearth -Infrastructure for the letemcook ecosystem. +Infrastructure for the letemcook ecosystem. You are home. ## Overview -Hearth deploys and manages: +Hearth runs on a single EC2 spot instance with k3s, hosting: -- **EKS Cluster** - Kubernetes on AWS with Karpenter auto-scaling -- **Forgejo** - Self-hosted Git (git.beyondtheuniverse.superviber.com) -- **Core Services** - Ingress, TLS, storage +- **Forgejo** - Self-hosted Git at git.beyondtheuniverse.superviber.com +- **PowerDNS** - Authoritative DNS for managed domains +- **Traefik** - Ingress with Let's Encrypt TLS ## Status | Component | Status | |-----------|--------| -| Terraform modules | Ported from coherence-mcp | -| EKS cluster | Not deployed | -| Forgejo | Not deployed | +| k3s cluster | Running | +| Forgejo | Running | +| PowerDNS | Running | +| TLS | Pending (rate limited until Jan 25) | + +## Managed Domains + +DNS served by PowerDNS for: +- superviber.com +- muffinlabs.ai +- letemcook.com +- appbasecamp.com +- thanksforborrowing.com +- alignment.coop + +## Cost + +| Component | Monthly | +|-----------|---------| +| EC2 t4g.small spot | ~$5 | +| EBS gp3 20GB | ~$2 | +| Elastic IP | ~$0.50 | +| **Total** | **~$7.50** | ## Getting Started See [CLAUDE.md](CLAUDE.md) for setup instructions. -## Cost Estimate +## Architecture -| Component | Monthly | -|-----------|---------| -| EKS Control Plane | $73 | -| Spot nodes (variable) | $0-50 | -| NLB | $16 | -| EFS | $5 | -| S3 | $5 | -| **Total** | **~$100-150** | +See [docs/architecture.md](docs/architecture.md) for details. + +## RFCs + +- [RFC 0003: PowerDNS Self-Hosted DNS](docs/rfcs/0003-powerdns-self-hosted.md) ## License diff --git a/docs/architecture.md b/docs/architecture.md index 94aba6e..63b802e 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,269 +1,164 @@ -# Foundation Infrastructure +# Hearth Architecture -RFC 0039: ADR-Compliant Foundation Infrastructure +Minimal infrastructure for ~1 user at ~$7.50/month. ## Overview -This directory contains Terraform modules and Kubernetes manifests for deploying -the Alignment foundation infrastructure on AWS EKS. +``` + Internet + | + +------------+------------+ + | Elastic IP | + | 3.218.167.115 | + +------------+------------+ + | + +-------------------+-------------------+ + | | | + :22 SSH :53 DNS :443 HTTPS + (Git) (PowerDNS) (Traefik) + | | | + +-------------------+-------------------+ + | + +------------+------------+ + | EC2 t4g.small (ARM) | + | Amazon Linux 2023 | + | 20GB gp3 EBS | + +------------+------------+ + | + +------------+------------+ + | k3s | + +-------------------------+ + | | + +------+------+ +------+------+ + | traefik | | dns | + | namespace | | namespace | + +-------------+ +-------------+ + | Traefik | | PowerDNS | + | (ingress) | | (auth DNS) | + +-------------+ +-------------+ + | + +------+------+ + | forgejo | + | namespace | + +-------------+ + | Forgejo | + | (git host) | + +-------------+ +``` -## Architecture +## Components + +### EC2 Instance + +- **Type**: t4g.small (2 vCPU, 2GB RAM, ARM64) +- **Pricing**: Spot instance (~$0.007/hr) +- **Storage**: 20GB gp3 EBS (encrypted) +- **OS**: Amazon Linux 2023 + +### k3s + +Lightweight Kubernetes distribution. Single-node cluster with: +- Built-in containerd +- Local storage +- No Traefik (disabled, using our own) + +### Traefik + +Ingress controller with: +- HTTP → HTTPS redirect +- Let's Encrypt ACME (HTTP-01 challenge) +- TCP routing for Git SSH + +### PowerDNS + +Authoritative DNS server for managed domains: +- superviber.com +- muffinlabs.ai +- letemcook.com +- appbasecamp.com +- thanksforborrowing.com +- alignment.coop + +Uses SQLite backend, data persisted to /data/powerdns. + +### Forgejo + +Self-hosted Git forge (Gitea fork): +- Web UI at git.beyondtheuniverse.superviber.com +- Git SSH on port 22 +- SQLite database +- Data persisted to /data/forgejo + +## Storage + +All persistent data on host filesystem: ``` - Internet - | - +---------+----------+ - | Shared NLB | - | (~$16/mo) | - +--------------------+ - | :53 DNS (PowerDNS)| - | :25 SMTP | - | :587 Submission | - | :993 IMAPS | - | :443 HTTPS | - +--------+-----------+ - | - +--------------------+--------------------+ - | | | - +-----+------+ +-----+------+ +------+-----+ - | AZ-a | | AZ-b | | AZ-c | - +------------+ +------------+ +------------+ - | | | | | | - | Karpenter | | Karpenter | | Karpenter | - | Spot Nodes | | Spot Nodes | | Spot Nodes | - | | | | | | - +------------+ +------------+ +------------+ - | | | | | | - | CockroachDB| | CockroachDB| | CockroachDB| - | (m6i.large)| | (m6i.large)| | (m6i.large)| - | | | | | | - +------------+ +------------+ +------------+ +/data/ +├── forgejo/ # Forgejo repos and database +│ └── gitea/ +│ ├── gitea.db +│ └── conf/app.ini +└── powerdns/ # PowerDNS database + └── pdns.sqlite3 +``` + +## Networking + +### Security Group + +| Port | Protocol | Source | Purpose | +|------|----------|--------|---------| +| 22 | TCP | 0.0.0.0/0 | Git SSH | +| 53 | UDP/TCP | 0.0.0.0/0 | DNS | +| 80 | TCP | 0.0.0.0/0 | HTTP (redirect) | +| 443 | TCP | 0.0.0.0/0 | HTTPS | +| 2222 | TCP | Admin IPs | Admin SSH | +| 6443 | TCP | Admin IPs | Kubernetes API | + +### DNS Flow + +``` +User query → GoDaddy NS lookup → ns1/ns2.superviber.com + ↓ + Glue record: 3.218.167.115 + ↓ + PowerDNS (port 53) + ↓ + Zone lookup → Response ``` ## Cost Breakdown -| Component | Monthly Cost | -|-----------|--------------| -| EKS Control Plane | $73 | -| CockroachDB (3x m6i.large, 3yr) | $105 | -| NLB | $16 | -| EFS | $5 | -| S3 | $5 | -| Spot nodes (variable) | $0-50 | -| **Total** | **$204-254** | +| Component | Monthly | +|-----------|---------| +| EC2 t4g.small spot | ~$5.00 | +| EBS gp3 20GB | ~$1.60 | +| Elastic IP | ~$0.50 | +| S3 backups | ~$0.50 | +| **Total** | **~$7.50** | -## ADR Compliance +## Backups -- **ADR 0003**: Self-hosted CockroachDB with FIPS 140-2 -- **ADR 0004**: "Set It and Forget It" auto-scaling with Karpenter -- **ADR 0005**: Full-stack self-hosting (no SaaS dependencies) +Daily cron job at 3 AM: +1. SQLite backup of Forgejo database +2. k3s state backup +3. Upload to S3 (hearth-backups bucket) +4. 60-day retention with lifecycle policy -## Prerequisites +## Limitations -1. AWS CLI configured with appropriate credentials -2. Terraform >= 1.6.0 -3. kubectl -4. Helm 3.x +This is personal infrastructure, not production-grade: -## Quick Start +- **No HA**: Single point of failure +- **Spot interruption**: Instance may be reclaimed (data persists on EBS) +- **No monitoring**: Basic healthchecks only +- **Single region**: us-east-1 only -### 1. Bootstrap Terraform Backend +## Future Work -First, create the S3 bucket and DynamoDB table for Terraform state: - -```bash -cd terraform/environments/production -# Uncomment the backend.tf bootstrap code and run: -# terraform init && terraform apply -``` - -### 2. Deploy Foundation Infrastructure - -```bash -cd terraform/environments/production -terraform init -terraform plan -terraform apply -``` - -### 3. Configure kubectl - -```bash -aws eks update-kubeconfig --region us-east-1 --name alignment-production -``` - -### 4. Deploy Karpenter - -```bash -# Set environment variables -export CLUSTER_NAME=$(terraform output -raw cluster_name) -export CLUSTER_ENDPOINT=$(terraform output -raw cluster_endpoint) -export KARPENTER_ROLE_ARN=$(terraform output -raw karpenter_role_arn) -export INTERRUPTION_QUEUE_NAME=$(terraform output -raw karpenter_interruption_queue_name) - -# Install Karpenter -helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \ - --namespace karpenter --create-namespace \ - -f kubernetes/karpenter/helm-values.yaml \ - --set settings.clusterName=$CLUSTER_NAME \ - --set settings.clusterEndpoint=$CLUSTER_ENDPOINT \ - --set settings.interruptionQueue=$INTERRUPTION_QUEUE_NAME \ - --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=$KARPENTER_ROLE_ARN - -# Apply NodePool and EC2NodeClass -kubectl apply -f kubernetes/karpenter/nodepool.yaml -kubectl apply -f kubernetes/karpenter/ec2nodeclass.yaml -``` - -### 5. Deploy Storage Classes - -```bash -export EFS_ID=$(terraform output -raw efs_id) -envsubst < kubernetes/storage/classes.yaml | kubectl apply -f - -``` - -## Directory Structure - -``` -infra/ -├── terraform/ -│ ├── main.tf # Root module -│ ├── variables.tf # Input variables -│ ├── outputs.tf # Output values -│ ├── versions.tf # Provider versions -│ ├── modules/ -│ │ ├── vpc/ # VPC with multi-AZ subnets -│ │ ├── eks/ # EKS cluster with Fargate -│ │ ├── iam/ # IAM roles and IRSA -│ │ ├── storage/ # EFS and S3 -│ │ ├── nlb/ # Shared NLB -│ │ └── cockroachdb/ # CockroachDB (future) -│ └── environments/ -│ └── production/ # Production config -├── kubernetes/ -│ ├── karpenter/ # Karpenter manifests -│ ├── cockroachdb/ # CockroachDB StatefulSet -│ ├── storage/ # Storage classes -│ ├── ingress/ # Ingress configuration -│ └── cert-manager/ # TLS certificates -└── README.md -``` - -## Modules - -### VPC Module - -Creates a VPC with: -- 3 availability zones -- Public subnets (for NLB, NAT Gateways) -- Private subnets (for EKS nodes, workloads) -- Database subnets (isolated, for CockroachDB) -- NAT Gateway per AZ for HA -- VPC endpoints for S3, ECR, STS, EC2 - -### EKS Module - -Creates an EKS cluster with: -- Kubernetes 1.29 -- Fargate profiles for Karpenter and kube-system -- OIDC provider for IRSA -- KMS encryption for secrets -- Cluster logging enabled - -### IAM Module - -Creates IAM roles for: -- Karpenter controller -- EBS CSI driver -- EFS CSI driver -- AWS Load Balancer Controller -- cert-manager -- External DNS - -### Storage Module - -Creates storage resources: -- EFS filesystem with encryption -- S3 bucket for backups (versioned, encrypted) -- S3 bucket for blob storage -- KMS key for encryption - -### NLB Module - -Creates a shared NLB with: -- HTTPS (443) for web traffic -- DNS (53 UDP/TCP) for PowerDNS -- SMTP (25), Submission (587), IMAPS (993) for email -- Cross-zone load balancing -- Target groups for each service - -## Operations - -### Scaling - -Karpenter automatically scales nodes based on pending pods. No manual intervention required. - -To adjust limits: -```bash -kubectl edit nodepool default -``` - -### Monitoring - -Check Karpenter status: -```bash -kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f -``` - -Check node status: -```bash -kubectl get nodes -L karpenter.sh/capacity-type,node.kubernetes.io/instance-type -``` - -### Troubleshooting - -View Karpenter events: -```bash -kubectl get events -n karpenter --sort-by=.lastTimestamp -``` - -Check pending pods: -```bash -kubectl get pods --all-namespaces --field-selector=status.phase=Pending -``` - -## Security - -- All storage encrypted at rest (KMS) -- TLS required for all connections -- IMDSv2 required for all nodes -- VPC Flow Logs enabled -- Cluster audit logging enabled -- FIPS 140-2 mode for CockroachDB - -## Disaster Recovery - -### Backups - -CockroachDB backups are stored in S3 with: -- Daily full backups -- 30-day retention in Standard -- 90-day transition to Glacier -- 365-day noncurrent version retention - -### Recovery - -To restore from backup: -```bash -# Restore CockroachDB from S3 backup -cockroach restore ... FROM 's3://alignment-production-backups/...' -``` - -## References - -- [RFC 0039: Foundation Infrastructure](../../../.repos/alignment-mcp/docs/rfcs/0039-foundation-infrastructure.md) -- [ADR 0003: CockroachDB Self-Hosted FIPS](../../../.repos/alignment-mcp/docs/adrs/0003-cockroachdb-self-hosted-fips.md) -- [ADR 0004: Set It and Forget It](../../../.repos/alignment-mcp/docs/adrs/0004-set-it-and-forget-it-architecture.md) -- [ADR 0005: Full-Stack Self-Hosting](../../../.repos/alignment-mcp/docs/adrs/0005-full-stack-self-hosting.md) -- [Karpenter Documentation](https://karpenter.sh/) -- [EKS Best Practices](https://aws.github.io/aws-eks-best-practices/) +See [RFC 0003](rfcs/0003-powerdns-self-hosted.md) for planned improvements: +- HA DNS with separate instance +- DNSSEC +- DNS-over-HTTPS +- PowerDNS-Admin UI diff --git a/terraform/minimal/user-data.sh b/terraform/minimal/user-data.sh index 242aa30..d64ee9c 100644 --- a/terraform/minimal/user-data.sh +++ b/terraform/minimal/user-data.sh @@ -32,10 +32,11 @@ systemctl enable --now docker sed -i "s/#Port 22/Port $SSH_PORT/" /etc/ssh/sshd_config systemctl restart sshd -# Add admin SSH key -if [ -n "${ssh_public_key}" ]; then +# Add admin SSH key (passed from terraform) +SSH_KEY="${ssh_public_key}" +if [ -n "$SSH_KEY" ]; then mkdir -p /home/ec2-user/.ssh - echo "${ssh_public_key}" >> /home/ec2-user/.ssh/authorized_keys + echo "$SSH_KEY" >> /home/ec2-user/.ssh/authorized_keys chown -R ec2-user:ec2-user /home/ec2-user/.ssh chmod 700 /home/ec2-user/.ssh chmod 600 /home/ec2-user/.ssh/authorized_keys