DevOpsDocs Help

Kubernetes Infra Proposal

Owners: Mahdi Darabi Created time: May 20, 2024 9:56 AM

Infrastructure Design

Kubernetes Cluster

  • Utilize Rancher for installing and maintaining a Kubernetes cluster (simplifies Kubernetes management, providing a user-friendly interface for cluster setup and maintenance.)

  • Configure 3 master nodes and 5 worker nodes (Having 3 master nodes ensures high availability and fault tolerance, while 5 worker nodes support workload distribution and scalability)

  • Implement a cluster of load balancers to balance traffic between worker nodes.(enhance performance by evenly distributing traffic and preventing overload on individual nodes.)

  • Isolate the Kubernetes cluster from the internet, with separate networks for internal and external access for load balancers.

Nexus Deployment

  • Image and Chart Repository: Nexus centralizes image and chart storage, improving deployment speed and reliability.

  • apt Repository

  • Secure Image Retrieval: By downloading images and packages from Nexus, we ensure consistency, security, and control over dependencies.

ArgoCD Implementation

  • Continuous Deployment: ArgoCD automates application deployment, reducing manual errors and enabling rapid releases.

  • Implement GitOps flow using a self-hosted GitLab instance to manage source code and pipelines for creating and pushing Docker images to Nexus.

Longhorn Setup

  • Persistent Storage: Longhorn provides reliable storage for applications, ensuring data persistence and availability.

  • Backup Strategy: Implementing backups in Longhorn safeguards against data loss and supports disaster recovery efforts.

Security Measures

  • Deploy an HTTP proxy server at the edge layer with dual networks to route egress traffic to external services like Arvan Cloud. (The HTTP proxy server controls outbound traffic, ensuring secure communication with external services and preventing unauthorized access.)

  • Edge Server Hardening: Securing edge servers mitigates external threats and protects sensitive data and services.

Additional Components

  • Kafka Cluster: Kafka facilitates real-time data processing, supporting microservices architecture and enabling scalable, fault-tolerant messaging.

  • Redis Cluster: Redis enhances application performance by caching frequently accessed data in memory, reducing latency and improving responsiveness.

Rationale for Decisions

  • Security: Isolating the Kubernetes cluster and hardening edge servers enhances security by limiting external access.

  • Reliability: Using Nexus for image storage and Longhorn for backups ensures data reliability and disaster recovery.

  • Efficiency: Implementing GitOps with ArgoCD streamlines application deployment and management processes.

Infra Diagram

EXT2
INT
EXT1
Build Images
Helm Charts
INTERNET
GITLAB
Arvan
Kavenegar
FireBase
BackupS3
RancherServer
Rancher
LoadBalancer Cluster, TURN Server
K8s Cluster
ControlPlane
M1
M2
M3
Workers
W1
W2
W3
W4
W5
HTTP Proxy Server
Nexus Registry
RUNNER
CDN Servers
Clients

Hosts

Needed Resources

Title

CPU

RAM

Storage

Count

Total CPU

Total RAM

Total Storage

Rancher Server

8

16

200

1

8

16

200

HTTP Proxy Server

4

8

50

1

4

8

50

HAproxy/Nginx

4

8

50

3

12

24

150

Nexus

12

32

2000

1

8

32

2000

K8S Control Plane Nodes

8

16

300

3

24

48

900

K8S Worker Nodes

48

128

1000

5

240

640

5000

Gitlab Runner

8

16

100

1

8

16

100

TURN Server

8

16

50

1

8

16

50

BareMetals

Name

CPU

RAM

Storage

Free CPU

Free RAM

Free Storage

G10

80

128

2000

56

88

1000

G9-1

80

380

6500

80

380

6500

G9-2

80

380

6500

80

380

6500

Hosts Configurations

Idx

Name

HostName

CPU

RAM

Storage

Machine

Desc

1

AI

ai1

24

48

1000

G10

Public IP

2

Control Plane 1

jar-cp1

8

16

300

G10

Local IP

3

Control Plane 2

jar-cp2

8

16

300

G9-1

Local IP

4

Control Plane 3

jar-cp3

8

16

300

G9-2

Local IP

5

Worker 1

jar-wrk1

48

128

1000

G10

Local IP

6

Worker 2

jar-wrk2

48

128

1000

G9-1

Local IP

7

Worker 3

jar-wrk3

48

128

1000

G9-2

Local IP

8

Worker 4

jar-wrk4

48

128

1000

-

Local IP

9

Worker 5

jar-wrk5

48

128

1000

-

Local IP

10

BootStrap Server 1

jar-bootstrap1

8

16

200

G9-1

Local IP - Rancher Server

11

Proxy Server 1

jar-proxy1

4

8

50

G9-2

Public IP (Tunnel?) - HTTP Proxy

12

Edge Server 1

jar-edge1

4

8

50

G9-1

Public IP (Clean IP, Range of 5) - High Speed Network Port

13

Edge Server 2

jar-edge2

4

8

50

G9-2

Public IP (Clean IP, Range of 5) - High Speed Network Port

14

Edge Server 3

jar-edge3

4

8

50

-

Public IP (Clean IP, Range of 5) - High Speed Network Port

15

Registry Server 1

jar-reg1

12

16

2000

G9-1

Public IP - Tunnel - Nexus

16

Gitlab Runner

jar-run1

8

16

100

G9-2

Public IP - Gitlab Runner

17

TURN Server

jar-turn1

8

16

50

G9-2

Public IP - TURN Server

SUM

340

832

9450

Last modified: 21 May 2024