What is all this scale-up/out/sideways/down business anyway?
A perennial problem in the IT world is how to handle the ebb and flow of user demand for IT resources. What is adequate to handle the demand at 3am on Sunday is not going to be sufficient to cope with the demand at noon on Black Friday or Cyber Monday. One way or another, IT resources must be sufficiently elastic to handle this range of user demand – they must be able to scale. If a company’s IT resources are not able to sufficiently scale to keep up with demand at any given time, a service outage will most likely result, with major consequential impact to the business. As a result, the matter of IT resource scaling is of great concern for IT departments.
How are IT processing resources scaled? The most obvious answer to this question is to simply add more hardware to an existing system – more processors, more memory, more disk storage, more networking ports, etc. Alternatively, simply replace a system with one that is bigger and more powerful. This approach is known as scaling-UP, or vertical scaling.
The biggest problem with this approach is that of diminishing returns. Most scale-up systems use a hardware architecture known as symmetric multiprocessing (SMP). Simply put, in an SMP architecture, multiple processors share a single block of physical RAM. As more processors are added, contention for this shared memory and other shared resources becomes a significant bottleneck so that less and less actual performance benefit is realized for each processor added. Each additional processor yields less than 1x the power of that processor; the more processors, the more contention and the less incremental benefit. As more and more resources are added, eventually the system is simply unable to scale any further to meet user demand. The same restriction applies when a system is replaced; eventually there will not be a single SMP system powerful enough to meet peak capacity demands.
Besides the scalability limits, there are other issues with the scale-up approach:
If vertical scaling has such significant issues, what are the alternatives? First, use a server with a different hardware architecture, a massive parallel processor (MPP). In an MPP architecture, each processor acts like a separate system, with its own memory, disks, and other hardware resources (“shared nothing”). Workload is distributed by the operating system and other software components across the processors, which communicate with each other via a high-speed message bus. Compared with an SMP, an MPP has no contention for shared resources (RAM, etc.); therefore, each processor delivers nearly 100% additional performance because MPP capacity scales linearly. The most well-known and successful MPP in the industry is the HPE NonStop server, which can scale linearly from 2-16 CPUs per system. However, what happens when a single system is not sufficient to handle the workload, or better availability is required?
Enter scale-OUT, or horizontal scaling. With scale-out, additional compute resources are provided by simply adding more servers, with the workload distributed between them. A scale-out architecture has the same characteristics and benefits as an MPP. In fact, an HPE NonStop server can be considered a scale-out system in a box. However, a scale-out architecture is unconstrained in terms of how many additional servers can be added, or the type of processors employed within each server (SMP or MPP). For example, a single HPE NonStop server can first scale-out by adding more CPUs, then by adding more NonStop servers to the network (up to a total of 255 servers). A scale-out architecture is able to meet much higher user demand levels than a scale-up architecture, because there essentially is no limit to the number of servers that can be incorporated.
Besides unlimited scalability, there are other benefits of the scale-out architecture over the scale-up architecture:
Applications can be designed for scale-out, but there’s an elephant in the room here. It’s fine to say that applications should be stateless and not use shared memory, but at some point, they have to access shared data. It doesn’t help scalability (or availability) if workload can be distributed across multiple application server instances, but then all data has to be accessed on a single database residing on a single system, or if data is partitioned (separated) across multiple systems. In order to maximize scalability (and availability) in a scale-out architecture, shared data must be available locally to all systems participating in the application, and each copy of the data must be kept consistent with all the others as the data is being updated (regardless of on which system application updates are being executed). Enter real-time data replication.
With transactional real-time data replication implemented between all systems participating in the application, multiple copies of the database can be distributed across each system, which are kept consistent as data is changed on any system. This distribution optimizes scalability by, a) allowing user requests to be routed to any system based on load (the so-called “route anywhere” model), and b) by scaling the database and also the application (i.e., removing the database as a source of contention and hence a bottleneck). If any system fails, other systems have up-to-date copies of the database on which processing can continue, thereby maximizing application availability. This characteristic applies not only to unplanned outages, but also to planned system maintenance, which can be performed serially across systems so that no application outages ever need to occur. This characteristic even applies to system and software upgrades, allowing for zero downtime migrations (ZDM).
The highest levels of scalability (capacity utilization) and availability are obtained by using an active/active application architecture as described above, where user requests are distributed and executed on any system. The scale-out principle also may be applied to active/passive and sizzling-hot-takeover (SZT) configurations. In these configurations, all update transactions are executed on a single active system, but scalability can still be achieved via the use of data replication from the active system to multiple passive systems, which are then used for read-only or query type applications. A good example of such an architecture is a so-called “look-to-book” application. Multiple read-only nodes are used to look-up information (e.g., airline/hotel seat/room availability, or stock prices), while the active system is only used when an actual transaction is executed (e.g., an airline/hotel reservation, or a stock trade). It thereby offloads the active system and scales-out the workload across multiple systems without requiring the application to run fully active/active.
To summarize, keeping up with user demand is a significant challenge for IT departments. The traditional scale-up approach suffers from significant limitations and cost issues that prevent it from satisfying the ever-increasing workloads of a 24×7 online society. The use of MPP and scale-out architectures is the solution, since they can readily and non-disruptively apply additional compute resources to meet any demand, and at a much lower cost. The use of a data replication engine to share and maintain consistent data between multiple systems enables scale-out application and workload distributions across multiple compute nodes, which provides the necessary scalability and availability to meet the highest levels of user demand now and into the future.
For more information on this topic, please read the full article, Scale-up is Dead, Long Live Scale-out! as published in the March/April issue of The Connection.
Please reference our Newsletter Disclaimer.