Software Architecture Concepts: [ Part 4] Scalability Basics

In order to understand how the the concept of cloud [unlimited computing resourcing] works, we need to understand the concepts of scalability. In other words how does computing resources scale ?

Let's consider the basic simple web application software architecture that described before. In this architecture we have customers using a front end client application to send requests to your backend software, the backend software process these requests and returns output to be displayed by the front end application.


What is scalability?

The scalability of an application / system can be measured by the load / number of requests it can effectively handle simultaneously. The point at which an application can no longer handle additional requests effectively is the limit of its scalability.

Any type of processing involves consuming computing power one way or another, One application might be doing heavy calculations that use CPU. Another application might need to do graphical processing on the GPU (Graphical Processing Unit). Another application might be IO intensive (requires transferring huge chunks of data between client and backend). In the end due to physical properties of a backend machine (Available RAM, Number of CPU cores, Available GPU power), The backend server can only handle a certain number of requests simultaneously before reaching its limits.

When we want to scale the system past its current limits there is two ways to do so: Vertical Scaling and Horizontal Scaling. Both methods are similar in that they both involve adding computing resources to your infrastructure. There are distinct differences between the two in terms of implementation and performance.

Difference between Vertical Scaling and Horizontal Scaling ?

Vertical scaling means scaling by adding more power (e.g. CPU, RAM) to an existing machine (also described as “scaling up”). By adding more power you allow the same machine to serve more customer requests as it now has more computing resources. Horizontal scaling on the other hand means scaling by adding more machines that can serve customer requests (also described as “scaling out”).

One of the fundamental differences between the two is that horizontal scaling requires changing the system that that the load (in this case customer requests) can be executed in parallel across multiple machines. In many respects, vertical scaling is easier because the system doesn't really doesn’t need to change. Rather, you’re just running the same code on higher-spec machines.


Scaling out Vs Scaling up

In choosing between the two, there are various factors to consider. These include:


If your system is solely designed for scaling up, you are locked into the price set by the hardware you are using. The problem here is sometimes you get limited by the available technology in the market. At a certain point there will be no machine on the market that can satisfy your vertical scaling needs.

On the other hand if you go with horizontal scaling, it is very easy to add machines of the same specifications, even of lower specification to you system to add computing power.


If you have a single powerful machine in your system, it is by default a single point of failure. Horizontal scaling offers built-in redundancy because if a machine goes down you only lose computing power that was provided by the faulty machines and your system is still running fine on the rest of machines.

Geographical Distribution

Imagine your system needs to serve customers across multiple geographical locations, with horizontal scaling you can add machines to your system in multiple regions, so for example customers from region A will have their requests served by machines in region A. this cuts down network latency as customer requests and server responses doesn't need to travel large distances.


Scaling out allows you to combine the power of multiple machines into a single virtual system with the combined power of all of them. This means you’re not limited to the capacity of a single unit.


It is no surprise that top of the line machines that enters the market are exponentially expensive as their specs increase. if you do the math you will find that it is cheaper to have the same computing power from multiple machines rather than from a single top of the line machine.


It’s likely that the industry will increasingly migrate towards a horizontally distributed approach to scaling architecture. This is driven by the demand for more reliability through a redundancy strategy, and the requirement for improved utilisation through resource sharing. Horizontal scaling comes at a cost, if you cannot redesign your system to be able to distribute load you will be only left with the Vertical Scaling option. In this case combining the two approaches can allow us to benefit from both paradigms.