Achieve 99%+ Uptime in a Distributed Environment

Distributed Software Development (DSD) has evolved, resulting in an increase in the application uptime and better User Request Handling. Organizations now tend to make greater development efforts in more attractive geographical zones or deploy directly on-premise. The main advantage of this lies in the greater availability of human resources in decentralized zones at lower cost and better accessibility.

This architecture design can be deployed in multiple forms:

  • Enterprise application (like employee management, internal Service Ops portal)
  • Internet-facing applications (like Website, or SaaS platform)
  • Cloud-based Applications (using AWS EC2, etc.)

and More..

As per applications requirement, different forms of deployment is considered. All these deployments models require the servers to be available and scalable as per requirement, in real-time.

For this post, we’ll focus on achieving maximum application uptime with consideration for Data Security and Integrity.

Consider the below defined architecture:

Distributed architecture for Enterprise level Uptime (99.9%+)


· LB1, 2: Load Balancer Server (E.g., Nginx) 8 GB Ram/ 4 Core each for 2,00,000 Concurrent requests

· Application Servers: Deployed on Intranet, Configuration as per Application requirement.

· In-Memory DB: Deployed on Handler VPN (e.g., DB can be MongoDB or Redis DB), 4 GB Ram/ 4 Core

· Apache Kafka Cluster: Cluster of 2–3 Kafka servers handling all the DB related operations (optional)

· DB Server: Hosted on On-Premise DB VPN (e.g., Clickhouse or Cassandra or Postgres), 16 GB Ram/ 16 Core

As per the above architecture, we require multiple servers (roughly 10 or more). Don’t be disheartened with the number of servers, this kind of architecture is advised to be used for applications handling 150K + parallel requests of 10Kb page size or 50k+ parallel request of requests of 100Kb page size(considering the application server is doing multiple operations in real-time to gather and design data in real-time).

If a website is intended to handle 40k to 60k concurrent requests, then the LB servers configuration can de decreased to 4GB ram with 2 Cores and In-Memory DB server configuration also won’t require 4 Cores, 2 Cores will do (Considering 10kb page size).

Lets talk about Security and why VPNs are implemented?

Since A lot of sensitive data is on networks. And a man-in-the-middle attack, RCE(Remote-Code Execution), SQL Injection on servers can cause dangerous amounts of damage in form of data leaks or cause data dis-integrity, etc. The most effective way to secure these communication channels between applications and DBs is by routing through a VPN. It provides a secure tunnel through which data can flow and can be accessible only to the nodes which have rights in that VPN group. Since data will be moved to-and-fro between the servers, a VPN ensures most safety.

How does the above architecture achieve maximum uptime?

Every Server which is crucial for an application to perform run smoothly has a copy, e.g., the Application Servers(A1 and A2) are copies of each other with same configurations and data, So, if one of these servers is to malfunction the other LBs(L1 and L2) will redirect the load to the other server, and during this time when all the user requests are handled by one application server, the other server can be restored.

I’ll be uploading more articles like this providing top-level architecture designs for different deployment types like decentralized, centralized, etc., and also provide information on how to configure different technologies for best in class performance and scalability option.

Stay tuned..

Do comment below, about your suggestions or advice on how i could improve the architecture or what other designs I should consider. I am further thinking about architecture in Edge Computing, so stay tuned…

• Software Engineer with years of experience in Software design and Development. Worked on multiple technologies like Nginx, Spring, AWS, Java, Python, etc,