Back to blog home

Horizontally scalable web applications

A guide to maximising profits and minimising costs.

Executive summary

Applications need to scale upwards to remain responsive during times of heavy load and backwards to reduce TCO when costly resources are no longer needed. Our aim is to provide a timely and performant user experience without needing to invest in a permanent platform ready for the biggest potential future traffic spike. In this white paper, we will examine some strategies for this and discuss the overhead required for each one. We will also make some recommendations about the kinds of scenarios which render each strategy a useful platform choice. For the purposes of this document, when we discuss scaling, it usually refers to horizontal scaling. Vertical scaling, or the adding of additional RAM, CPUs or hard disk space, is briefly discussed but not germane to the discussion since every system can benefit somewhat from having bigger servers to run on. There are two main challenges to scaling applications horizontally. The first is designing an application that can be scaled as and when the need arises. To do this requires some architectural planning of the system, using techniques such as:

  • Breaking applications into a series of loosely-coupled subsystems tied together via APIs
  • Horizontally scaling subsystems individually, as opposed to vertically scaling the entire application
  • Scaling the database by using techniques like replication, sharding or federation

Each of these strategies has its benefits and its costs. None of the techniques discussed in this paper is exclusive to the others. Well-built scalable systems are designed from the beginning to incorporate some or all of these techniques in one form or another.

The second challenge in scaling an application is arriving at a system which can add and remove resources automatically. Not all systems will have these features, and not all will need it. However, awareness of the available solutions in this area will be discussed later in this paper.

What is scalability?

Scalability is the ability of any system to gracefully resize its resource usage to match current needs. In growth times, this means that systems need to be able to identify increases in traffic and add additional resources as needed to meet these additional demands. Conversely, when needs lessen, a truly scalable system will be able to detect that as well and reduce the resources being consumed. For our purposes in this document, when we speak of resources we mean hardware. In web-based applications, however, other resources have to be taken into consideration as well. When considering scalability, everything from the physical network to third-party APIs has to be able to scale up or back based on needs.

In an ideal environment, the application would be able to monitor itself and automatically request additional resources or release unneeded ones. However, these ideal situations only exist in the largest of applications, and for the most part, scaling a system up or down is a manual process. Most applications can be designed in such a way that they will help in the monitoring and notify someone when additional resources are needed. For the most part, however, careful monitoring of resource usage using existing tools will let system administrators know when to scale up or back.

Scalability is not high performance

Theo Schlossnagle, CEO of OmniTI and recognised expert on the topic of Scalable Internet Architectures, once said:

If you use the word ‘scalable’ when you should have used ‘high performance’, I’ll beat you with a stick.

Theo Schlossnagle, CEO of OmniTI

While Mr. Schlossnagle said this in the context of an advertisement for a job, he brings home a point that should resonate with anyone designing an application: scalable does not mean high performance. It is assumed that IT managers are concerned about scalability because they are concerned about the performance of their application. Making a system scalable simply means the designers want to solve any performance problems by allowing additional resources to be added to the system. High performance is making the system work better with the resources available.

To improve a system’s performance, a developer will traditionally profile the code to locate areas of the code that are performing sub-optimally. They will then re-factor those areas, re-working the code until it performs at a satisfactory speed. A system can be highly performant and still not be scalable, and while scaling a system will increase its performance, it will not make it a highly performant system. It is important to understand and appreciate the difference between the two terms before going forward.

What goes up, must come down

As previously stated, for a system to be truly scalable, it has to scale in both directions. It is not enough for a system to simply be able to request additional bandwidth or spin up additional servers, virtual or physical. Scalable systems also have to be able to release the additional resources when they are no longer needed. Again, the process does not have to be automatic, but it does need to be simple enough so that no comparison of what the resources are costing vs. how long the process takes needs to be made.

While the ROI of being able to quickly scale a system up is obvious, the ROI of scaling back, fewer resources to pay for, sometimes gets lost in the shuffle. If a web application’s traffic is seasonal, then at the start of its peak season the administrator will want to be able to easily bring online additional resources. However, these resources will add to the operational cost of your website. Truly scalable systems make removing these resources just as easy as adding them, so that once the system is no longer in its peak season, it is no longer consuming what is now unused capacity. 

Scalable by design

Truly scalable systems do not happen by accident: they are designed to be scalable. One only has to look at the recent history of to see that systems that are not designed for scalability will pay the price. The original developers of Twitter had no idea that their little project would become the next big thing, so they made design choices based on expediency instead of scalability. While this served them well in the early days while they were trying to get a working system, the design decisions they made soon began to hobble the system as traffic began to grow. It wasn’t their reliance on any one piece of technology that caused problems, but the fact that they did not think through the process of adding additional resources and make sure that their design decisions did not make that impossible.

Scalability can be as simple as making sure your application will work behind a load balancer, or as complex as designing a database sharding scheme. Regardless of your approach, scalability is a design principle, not a feature to be bolted onto an existing application.

Loose coupling

One of the primary principles in designing scalable systems is the loose coupling of systems. Systems that are tightly coupled are not scalable. When building PHP systems, loose coupling means things like this: 

  • How will your application access its database?
  • Is the database server name hardcoded into the application or a property in your configuration file?
  • Can your system easily switch to reading from one database and writing to another?
  • Are there resources or files on the local file system that have to be shared between all instances of the application running?

Any of these and other decisions like them have to be contemplated when designing a scalable system. Access to resources external to the application and how these resources are addressed are one of the main points that a scalable system will address. Well- designed scalable systems will not tightly couple the application to a specific resource. Instead, they will allow for accessing these resources through a proxy or accessing a pool of resources instead of a single resource. This indirect access to external resources is what is meant when we use the term loose coupling. 

Create APIS, assemble applications

A closely related concept to loose coupling is that of using Service Application Programming Interfaces, or Service APIs. By abstracting the common tasks used by multiple systems and moving those tasks into their own APIs, system architects can reduce the complexity of each individual application while realising a greater ROI on the common APIs. These APIs can be internal, created by your development teams, or external, rented from third-party developers. Either way, application development becomes a matter of assembling the APIs you need for the core functionality of any application and then writing the parts of the application that are unique.

This loose coupling of systems via APIs makes your application more scalable because systems can off-load part of your processing to other systems that are independently scalable. 

Scaling your application

Now that we understand what scalability is, and more importantly what it is not, let’s explore the how and the why of building scalable applications.  

Create APIS, assemble applications

Before we can discuss scaling a system we need to discuss the ways a system can scale. There are two different ways to scale web-based applications, vertically and horizontally.

Vertical scaling is adding resources to a single node in the system. Typically this means adding CPUs, RAM, hard drives, network interfaces and other pieces to an existing server to make it operate faster. Vertical scaling has its place but for the most part, when discussing scaling a PHP application, we discuss horizontal scaling.

Horizontal scaling is adding additional resources, typically servers, to an existing infrastructure and spreading the load of one or more of the subsystems out across them. This practice of horizontal scaling is also known as scaling “wide”. Because of its ‘shared nothing’ design philosophy, PHP excels at horizontal scaling. With the help of a good load balancer, commodity hardware can be quickly deployed to help offload the processing of the front web servers. Behind that, with proper forethought, databases can be replicated to ease the burden of a single RDBMS server having to manage the data for the entire application. Finally, as subsystems are extracted from the main application and put behind APIs, the processing of those requests can be hidden behind a load balancer of its own and multiple servers used to satisfy the requests.

Breaking it apart - Thinking in pieces

When building scalable systems, developers must think through scaling strategies while working on the functional design and have these strategies formulated before starting the technical design phase. When deconstructing an application, pieces begin to fall into three large buckets.


In every web-based application there are pieces that are dynamic and pieces that are static. Examples of the static pieces of any application are:

  • Images
  • Video
  • JavaScript
  • CSS

Anything that is not generated on-the-fly is considered static. Static files are easier to serve than dynamic files because no decision logic needs to be executed before serving them. It is common to use separate, optimised servers to deliver these elements of an application. The static content can be served quickly from a basic web server with no additional logic needed. If the static content is slowing down the overall delivery performance of an application, it is possible to extend this element of the system only, without needing to invest across the entire application platform. This creates a flexible, cost-effective way of delivering these types of content. 


Once an application’s static content has been moved off to its own platform, administrators and architects can begin looking at the platform that will serve the dynamic content. It is important to adhere as closely as possible to PHP’s mantra of ‘shared nothing’ when designing your application. Even in the best cases though, some state information must be preserved on the server and not pushed to the client, in particular the session information.

By default, session information is stored on the local hard disk of the web server. This works well for small applications that will be executing on a single server, or in situations where your load balancer supports “sticky sessions2 ” – the ability to recognise a return visit and direct them to the same server that handled their last request. This means that it is safe to store the sessions locally. With sticky sessions, once a session has been assigned to a server, regardless of the load that server is under, the user will continually be redirected back to it. Obviously, this is not the ideal solution but it is the easiest to implement and works well for many sites, even those under consistent high loads.

If the application really needs to be balanced across a server group then a session clustering solution is the only answer. There are several session clustering options from. Get a second opinion which to select. The correct one will be the one that fits best into the application’s design.

Database Session Clustering

Given its tight integration with MySQL, by far the easiest option for session clustering is to store your sessions in your database. When PHP serves a page, instead of fetching the session data off of the hard disk, it fetches from the database. This solves the issue of distributing the session data, since your entire web server group will already have access to the database servers. However, there is a downside to offloading session storage to a database: doing so may create more work for the database server than it can handle. Database clustering is a tried and true method but the additional database load may prevent it from being the best solution for your application, depending on the application and existing architecture.

One way to overcome the load that sessions will put on your database is to set up a Session Cluster database separate from your main database. This way, the process can be broken into even smaller pieces. The session cluster database can be hosted on the primary database server or broken out into its own server. It can be scaled independently and, much like the alternative web servers we mentioned earlier, the database platform used for the sessions can differ from the one used for the main application data store (more on a couple of alternative database platforms later in this paper). This will allow a fast, sessions-only database to be implemented and used by your application, enabling true load balancing and therefore efficient use of hardware resources and better ROI.  

Memcached Session Clustering

An alternative to storing sessions in your database is to install memcached3 and its extension in PHP. Making memcached accessible from each of the application’s front web servers will give the application a distributed storage mechanism that, among its other uses, will allow it to store sessions in a fast, convenient and distributed way. The side benefit of using memcached for session clustering is that applications now have access to memcached itself. This will have a positive effect on the overall system performance as the application is able to store semi-persistent data in distributed memory instead of in the database.

However, memcached does not come without a price. First, memcached stores the data inmemory only, which might not be suitable for all applications of sessions. Second, adding memcached to the technology mix is adding another moving part that will need to be configured, properly tweaked and maintained. IT managers will need to make sure they have the resources and expertise available on both the development and support teams.

These solutions will help system architects break one of the most contentious pieces of any given application – the session storage – off into its own piece which can be scaled independent of the rest of the application. This does, however, leave the bulk of your application still intact and only scalable as a single unit. 

Session Clustering with Zend Server

In June 2010, Zend released Zend Server Cluster Manager, which brings the now- deprecated Zend Platform’s session clustering capabilities to Zend Server. The advantage this gives over database or memcached session clustering is support. If vendor support is important to your organisation, Zend Server may be your best solution. The downside, as with memcached, is that Zend Server will add one more moving part to be managed within your environment. Zend Server has a heavier footprint on your servers than memcached would have, which means fewer resources available to actually dedicate to the application.


Identifying common functionality across an entire range of a corporation’s applications and breaking it out into a separate modular application, accessed via an API (application pro- gramming interface), means that the servers that serve the main application have to do less work. The common code is pulled into a central location and then consumed by each ap- plication that needs to use it. This means much less duplication of development effort and a reduced maintenance burden throughout the application lifetime. Of course our applica- tions must always live and grow alongside our changing business needs, and using APIs for modular design makes this easier with a central point for code rather than duplication in multiple systems or locations. 

A common example is authentication. If a corporation has multiple applications, each requiring users to authenticate before using it, it makes sense to break the authentication code out into its own application. For development purposes and even when your load is light in production, these APIs can be served by the same server group as your main appli- cation. However, as your traffic grows, your authentication API can be moved off to another server or server group that is firewalled from the main internet and only accessible via the internal network.

Instead of adding additional resources to your web server group, which will suffer from diminishing returns as you add more servers to the group, you have moved this processing to its own server or group which gives it its own resources and frees up capacity on your main web server group. This will lead to better scalability options in the future, as you now have two points of scalability instead of just one. This design approach, allowing targeted scaling, is a key tool for getting the best performance out of the system with minimum on- going costs.

Authentication is only one of the common parts that can be broken out and addressed via an API. As you begin to look at your application, you will undoubtedly be able to identify others. 

Scaling your database

Up to this point, we have discussed at length how to break your application into pieces and scale those pieces. Almost all applications, however, need some kind of persistent data storage system. If you think of your database as one of the building blocks of your applica- tion, then you can see how it can easily be separated off into its own subsystem. Breaking it out and then coupling it back to the main application via an API gives you one more point of scalability. Once you look at your database as a separate block, there are several tech- niques that can be employed to scale your database. 

Database Replication

The first technique, database replication, is by far the most common. Most modern RDBMSs have a replication schema built into them, ready for you to deploy. For example, MySQL comes with functionality built in for Master/Slave replication. Before you can implement replication in your application though, you have to have designed your application in such a way that you can read from a slave but write to the master. Breaking your database access into a database access layer as the proxy to your actual database will allow you to make the change in a single place and have it apply to the entire application. 

Scaling by replication is as simple as adding additional slaves and allowing each request to de- cide which slave is available to process it. However, at a certain point, the number of slaves in the system begins to deteriorate the overall system performance. Simply put, database replica- tion will only take you so far in scaling your application but it is a very useful tool in our box of scalability tactics. 

Database Federation

A federated database system (FDBS) simply put is a virtual database made up of two or more databases. These databases are usually hosted on different servers and could possibly be from different vendors. The advantage of a federated database system is that the application sees distributed and diverse database resources as a single database and can select, insert and delete from and to the federated database just like it would any one of the single databases.

The downside to federated databases is the complexity of the system and the additional knowledge necessary to keep things running. The complexity of not only managing an FDBS but of also managing the individual RDBMS that make up the FDBS makes federation an expensive choice and only viable in organisations that have the capacity and skills to manage them.  

Database Sharding

Whereas FDBS presents a unified ‘virtual’ interface to data that is distributed across various databases, servers and in many cases geographic location, sharding4 breaks databases up into manageable pieces based on a predetermined scheme. Database sharding is also known as ‘horizontal partitioning’. Each ‘shard’ in the database is of the exact same structure but contains only a predefined subset of the entire database.

For example, if your application maintains user login records, a sharding scheme may be that all users whose username starts with A-M go in the shard hosted on server A. All users whose username begins with N-Z go on the shard hosted on server B. This way, when the user types in their user name, it is easy to determine which shard to request their information from. Another schema may have the partitioning based on the continent of origin: therefore all users who live in North America are stored on one shard and all users who live in Europe are stored on another. Under this schema, the databases themselves could be geographi- cally distributed, thus reducing latency and increasing application performance.

Sharding, like federation, is a complex system5. The schemas we’ve described are relatively simple; in real world applications, they are much more complex. They require resources and expertise to setup, configure and maintain. They are, however, the right solution for many large-scale distributed applications because the sharding algorithm can be constantly tweaked as more and more shard servers are brought online to meet demand. Sharding does not, however, scale down easily. Once your database has been broken into pieces, combining those pieces may be a difficult and time consuming task. So while sharding will improve your application’s overall performance if implemented properly, it can be less ideal when we consider the ‘scaling back’ test. 

Alternative Data Storage Engines

The final topic to discuss when discussing scaling techniques for databases is alternative data storage engines. New concepts in data storage are now cropping up in web based applications. 

API-Only Data Acess 

The concept of API-only databases, where the application does not have direct access to the database but only to a REST interface into the database pool, is gaining in popularity. API-only databases allow you to abstract the processing of the database one step further so that the database sits behind an API server group. The API server group handles the actual interaction with the client, authentication and other non-database specific pieces, isolating the databases to only work on the task of serving up data.

API-only databases make replication simple to implement because your application does not need to know if it’s talking to a slave or a master, it only needs to submit a well-formed re- quest and process the result set. This is a really good example of a situation where modularity can help us to design systems which can co-exist on a single machine for development, and scale out as much as needed later on. 

CouchDB and Amazon SimpleDb

Another new concept in data storage is that offered by CouchDB or Amazon’s SimpleDB. These document-based databases offer REST interfaces for managing data and databases. CouchDB is an open source service that you can install on your own network and host your- self. It runs over the HTTP protocol and therefore is easy to access from any language that can access a web page, including JavaScript.

SimpleDB is a hosted service run by Amazon. It provides a similar API and feature set to CouchDB with the addition of Amazon’s network management. Because it is hosted on a large and constantly scaling network, it is a building block of which you don’t have to manage the scalability.

Both are good tools and the tool of choice depends on your exact application and corporate culture. They do differ from the traditional database so they are less useful for tabular data, but optimised and powerful when dealing with applications which need to tag, store and re- trieve documents. Search functionality is a good application for these tools.

Scaling to the cloud

Much has been written in the past few quarters about the potential for cloud computing. One thing that excites many IT managers and budget stakeholders is the capability to scale ap- plications into the cloud. The US National Institute of Standards and Technology, Information Technology Laboratory, in their draft definition of cloud computing define three distinct service models that are all described as cloud computing. In our discussion of APIs we touched on the first one when we discussed third party APIs, defined as Cloud Software as a Service (SaaS). The other two, Cloud Platform as a Service (PaaS) and Cloud Infrastructure as a Service (Iaas) both affect scalable applications in different ways. 

Platform as a service

PaaS provides a place to deploy existing applications in the cloud. The vendor provides the platform, language and other tools necessary to deploy the consumer-created application. Google App Engine is the best example of this. If your application is written in Python, chances are good that you can make it run on Google’s App Engine. They provide the hosting and com- puting infrastructure, you provide the code.

PaaS is an excellent choice if your chosen vendor supports your development language and you have broken your application into multiple, loosely coupled parts that can be integrated via APIs. Your busiest, and least risky APIs can be pushed to a PaaS vendor and the computing power off- loaded from your system onto a flexible platform that can expand and contract along with the demand placed upon it – and the costs will be relative to the service delivered. This avoids either scrambling to add hardware when applications outgrow their platforms, or invest- ing in hardware for an expected traffic level which then does not materialise. 

Infrastructure as a service

aaS is best exemplified by Amazon’s Elastic Cloud Computing (EC2). EC2, along with competitors like Rackspace’s Slicehost, provides programmatically controlled virtual servers. Using EC2 as the primary example, you can create an EC2 image of your technology stack with your application installed and ready to run. Then, as your needs change, your applica- tion or your system administrators can respond by ‘spinning up’ new instances. When they are no longer needed, you simply spin them back down.

This on-demand capacity is now being successfully deployed by companies to handle over- flow traffic, the “Digg effect”, being “slashdotted6” or any time that their needs temporarily exceed their current capacity. Doing so, however, means that they have planned ahead and have built their applications to scale.

The danger is to think that IaaS is a magic cure for scalability issues. As currently imple- mented, IaaS instances can take two to five minutes to spin up, longer if your system takes time to recognise one and put it into the load balancer. IaaS won’t help keep your site up during a Slashdot-induced storm. However, if you know it is coming and you have an image already prepared, you can pre-launch and integrate one or more instances as needed to keep your application running strong. 

Degrade gracefully

The danger in pushing critical pieces of your infrastructure to the cloud is that you have to take into account the fact that at some point in your application’s lifecycle, the network will fail and the service will become unavailable, just as can happen between and even within datacentres. Your application will need to be able to deal with this failure. In some cases, when the missing API is non-mission critical, your application will be able to continue to func- tion at a reduced capacity; other times, when the missing API is critical to the application, you will have to notify the user that the application is temporarily unavailable.

Unlike when a component inside your network fails, when a hosted API, third party API or IaaS becomes unavailable, there is little your support staff can do. They are at the mercy of the vendor to restore things. Selecting the right vendor is critical when deciding to push to the web. The other critical piece is how your application handles the problem. Being able to continue to operate at reduced capacity unless mission critical parts are failing is known as ‘degrading gracefully’. 

Additional Reading

The topic of scalability is a broad one that crosses several disciplines. The purpose of this document has been to give an overview of the major issues that managers will have to face when beginning a project that has scalability as a requirement. This document purposefully did not delve into technical details on any single topic.

Not wanting to leave anyone wanting for technical resources, here are a few resources that we at Inviqa would like to point out for your further reading. These resources present techni- cal details on the topics that we have discussed. 

Scalable Internet Architectures

Theo Schlossnagle – 067232699X  

Scalable Internet Architectures (slides from the presentation) 

Theo Schlossnagle – 

Building Scalable Web Sites: Building, scaling, and optimising the next generation of web applications

Cal Henderson – 0596102356  

Building Scalable Web Sites: Tidbits from the sites that made it work (PDF) 

Gabe Rudy –








Scalability is an important software design philosophy. It is not a feature to be bolted onto an existing application and it is not a checkbox at the end of your project. If you build your application but forget to consider scalability, you will have failed to learn the lessons of history and will be doomed to repeat them, usually with the same expensive price tags. Scalability, when designed into an application from the beginning, does not cost any more to develop. When it’s not designed in, the price to add it on can easily overshadow the original price of the application. 



Image: © Images Money via Flickr under Creative Commons Attribution 2.0 Generic