hearth/kubernetes/karpenter/nodepool.yaml
Eric Garcia e78000831e Initial commit: Port infrastructure from coherence-mcp
Hearth is the infrastructure home for the letemcook ecosystem.

Ported from coherence-mcp/infra:
- Terraform modules (VPC, EKS, IAM, NLB, S3, storage)
- Kubernetes manifests (Forgejo, ingress, cert-manager, karpenter)
- Deployment scripts (phased rollout)

Status: Not deployed. EKS cluster needs to be provisioned.

Next steps:
1. Bootstrap terraform backend
2. Deploy phase 1 (foundation)
3. Deploy phase 2 (core services including Forgejo)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 06:06:13 -05:00

107 lines
2.9 KiB
YAML

# Karpenter NodePool Configuration
# RFC 0039: ADR-Compliant Foundation Infrastructure
# ADR 0004: "Set It and Forget It" Auto-Scaling Architecture
#
# This NodePool enables automatic scaling from 0 to 100 vCPUs
# using spot instances for cost optimization.
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
node-type: general
spec:
requirements:
# Use spot instances for cost optimization
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
# Instance types optimized for general workloads
- key: node.kubernetes.io/instance-type
operator: In
values:
- m6i.large
- m6i.xlarge
- m6i.2xlarge
- m5.large
- m5.xlarge
- m5.2xlarge
- c6i.large
- c6i.xlarge
- c6i.2xlarge
# AMD64 required for FIPS compliance
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
# Multi-AZ for high availability
- key: topology.kubernetes.io/zone
operator: In
values:
- us-east-1a
- us-east-1b
- us-east-1c
nodeClassRef:
name: default
# Limit total cluster compute (ADR 0004: auto-scale to this limit)
limits:
cpu: 100
memory: 200Gi
# Scale down aggressively when idle (ADR 0004: near-zero cost when idle)
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
budgets:
- nodes: "10%"
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: cockroachdb
spec:
template:
metadata:
labels:
node-type: database
workload: cockroachdb
spec:
requirements:
# On-demand for database stability
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
# Database-optimized instances
- key: node.kubernetes.io/instance-type
operator: In
values:
- m6i.large
- m6i.xlarge
- m6i.2xlarge
# AMD64 required for FIPS
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
# Spread across AZs
- key: topology.kubernetes.io/zone
operator: In
values:
- us-east-1a
- us-east-1b
- us-east-1c
taints:
- key: workload
value: cockroachdb
effect: NoSchedule
nodeClassRef:
name: default
limits:
cpu: 18 # 3 nodes x 6 vCPU max (m6i.xlarge)
memory: 72Gi # 3 nodes x 24GB max
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 24h # Don't disrupt database nodes quickly
budgets:
- nodes: "0" # Never disrupt database nodes automatically