Kubernetes Infra Proposal
Owners: Mahdi Darabi Created time: May 20, 2024 9:56 AM
Infrastructure Design
Kubernetes Cluster
Utilize Rancher for installing and maintaining a Kubernetes cluster (simplifies Kubernetes management, providing a user-friendly interface for cluster setup and maintenance.)
Configure 3 master nodes and 5 worker nodes (Having 3 master nodes ensures high availability and fault tolerance, while 5 worker nodes support workload distribution and scalability)
Implement a cluster of load balancers to balance traffic between worker nodes.(enhance performance by evenly distributing traffic and preventing overload on individual nodes.)
Isolate the Kubernetes cluster from the internet, with separate networks for internal and external access for load balancers.
Nexus Deployment
Image and Chart Repository: Nexus centralizes image and chart storage, improving deployment speed and reliability.
aptRepositorySecure Image Retrieval: By downloading images and packages from Nexus, we ensure consistency, security, and control over dependencies.
ArgoCD Implementation
Continuous Deployment: ArgoCD automates application deployment, reducing manual errors and enabling rapid releases.
Implement GitOps flow using a self-hosted GitLab instance to manage source code and pipelines for creating and pushing Docker images to Nexus.
Longhorn Setup
Persistent Storage: Longhorn provides reliable storage for applications, ensuring data persistence and availability.
Backup Strategy: Implementing backups in Longhorn safeguards against data loss and supports disaster recovery efforts.
Security Measures
Deploy an HTTP proxy server at the edge layer with dual networks to route egress traffic to external services like Arvan Cloud. (The HTTP proxy server controls outbound traffic, ensuring secure communication with external services and preventing unauthorized access.)
Edge Server Hardening: Securing edge servers mitigates external threats and protects sensitive data and services.
Additional Components
Kafka Cluster: Kafka facilitates real-time data processing, supporting microservices architecture and enabling scalable, fault-tolerant messaging.
Redis Cluster: Redis enhances application performance by caching frequently accessed data in memory, reducing latency and improving responsiveness.
Rationale for Decisions
Security: Isolating the Kubernetes cluster and hardening edge servers enhances security by limiting external access.
Reliability: Using Nexus for image storage and Longhorn for backups ensures data reliability and disaster recovery.
Efficiency: Implementing GitOps with ArgoCD streamlines application deployment and management processes.
Infra Diagram
Hosts
Needed Resources
Title | CPU | RAM | Storage | Count | Total CPU | Total RAM | Total Storage |
|---|---|---|---|---|---|---|---|
Rancher Server | 8 | 16 | 200 | 1 | 8 | 16 | 200 |
HTTP Proxy Server | 4 | 8 | 50 | 1 | 4 | 8 | 50 |
HAproxy/Nginx | 4 | 8 | 50 | 3 | 12 | 24 | 150 |
Nexus | 12 | 32 | 2000 | 1 | 8 | 32 | 2000 |
K8S Control Plane Nodes | 8 | 16 | 300 | 3 | 24 | 48 | 900 |
K8S Worker Nodes | 48 | 128 | 1000 | 5 | 240 | 640 | 5000 |
Gitlab Runner | 8 | 16 | 100 | 1 | 8 | 16 | 100 |
TURN Server | 8 | 16 | 50 | 1 | 8 | 16 | 50 |
BareMetals
Name | CPU | RAM | Storage | Free CPU | Free RAM | Free Storage |
|---|---|---|---|---|---|---|
G10 | 80 | 128 | 2000 | 56 | 88 | 1000 |
G9-1 | 80 | 380 | 6500 | 80 | 380 | 6500 |
G9-2 | 80 | 380 | 6500 | 80 | 380 | 6500 |
Hosts Configurations
Idx | Name | HostName | CPU | RAM | Storage | Machine | Desc |
|---|---|---|---|---|---|---|---|
1 | AI | ai1 | 24 | 48 | 1000 | G10 | Public IP |
2 | Control Plane 1 | jar-cp1 | 8 | 16 | 300 | G10 | Local IP |
3 | Control Plane 2 | jar-cp2 | 8 | 16 | 300 | G9-1 | Local IP |
4 | Control Plane 3 | jar-cp3 | 8 | 16 | 300 | G9-2 | Local IP |
5 | Worker 1 | jar-wrk1 | 48 | 128 | 1000 | G10 | Local IP |
6 | Worker 2 | jar-wrk2 | 48 | 128 | 1000 | G9-1 | Local IP |
7 | Worker 3 | jar-wrk3 | 48 | 128 | 1000 | G9-2 | Local IP |
8 | Worker 4 | jar-wrk4 | 48 | 128 | 1000 | - | Local IP |
9 | Worker 5 | jar-wrk5 | 48 | 128 | 1000 | - | Local IP |
10 | BootStrap Server 1 | jar-bootstrap1 | 8 | 16 | 200 | G9-1 | Local IP - Rancher Server |
11 | Proxy Server 1 | jar-proxy1 | 4 | 8 | 50 | G9-2 | Public IP (Tunnel?) - HTTP Proxy |
12 | Edge Server 1 | jar-edge1 | 4 | 8 | 50 | G9-1 | Public IP (Clean IP, Range of 5) - High Speed Network Port |
13 | Edge Server 2 | jar-edge2 | 4 | 8 | 50 | G9-2 | Public IP (Clean IP, Range of 5) - High Speed Network Port |
14 | Edge Server 3 | jar-edge3 | 4 | 8 | 50 | - | Public IP (Clean IP, Range of 5) - High Speed Network Port |
15 | Registry Server 1 | jar-reg1 | 12 | 16 | 2000 | G9-1 | Public IP - Tunnel - Nexus |
16 | Gitlab Runner | jar-run1 | 8 | 16 | 100 | G9-2 | Public IP - Gitlab Runner |
17 | TURN Server | jar-turn1 | 8 | 16 | 50 | G9-2 | Public IP - TURN Server |
SUM | 340 | 832 | 9450 |