Architecting for High Availability and Performance

One of the most commonly asked questions is how you can build a web application that can process millions of requests a day while maintaining performance and stability? In this post we’re going to cover what the infrastructure looks like and the software being used to achieve the desired outcomes.

This will work with most websites and content management systems that’ll run on a LAMP/LEMP stack including WordPress, Craft, Drupal, and Joomla.

 

First we’re going to outline the basic requirements of what we want to achieve. The web application will:

  • Serve tens of millions of requests a day
  • Survive multiple server failures
  • Scale easily
  • Be super awesome

 

Designing for failures and cost

The best way to prepare for failures is to build at least two of everything. This means that if one server dies then the second is able to continue serving customers. Since we’re going to be using two of everything, the price will build up so this design isn’t for everyone unless you’re willing to splash out a little.

We’re going to be using DigitalOcean to estimate our pricing using the smallest $5 droplets. This means that you keep the monthly costs to a minimum until you need to scale up.

  • Two load balancers @ $5 each = $10
  • Two web servers @ $5 each = $10
  • Two database cache servers @ $5 each = $10
  • Two database servers @ $5 each = $10
  • Total monthly cost: $40 excluding tax

These droplets give you: 512MB memory, 1 processor, 20GB SSD disk, 1TB transfer. These aren’t the most powerful servers however they will achieve the bare minimum for the time being. You can quickly and easily scale up with a few clicks when you need to.

 

CloudFlare CDN + WAF

I recommend using CloudFlare as a CDN on all of your websites. Not only does it improve the delivery of content and take load off your servers but it will protect your application from a variety of attacks including XSS, SQLI, and DDoS attacks.

It’s also dirt cheap and don’t charge your bandwidth which is always a bonus.

 

HAProxy

We’ll be using HAProxy as our load balancers. HAProxy is fast and reliable TCP/HTTP proxy able to balance traffic across web servers. It’s used by a number of high-profile websites including Reddit, Tumblr, Twitter, and BitBucket.

 

NGINX + PHP7.0-FPM

Nginx is a HTTP is a great web server when you’re looking for speed and efficiency. Nginx is faster at serving static files and consumes much less memory for concurrent connections. Because Nginx is event-based, it doesn’t need to fire up new processes or threads for each request, so its memory usage is very low.

If you’re hosting a PHP heavy application such as WordPress, that’s where you’ll be bottle-necking. PHP will be taking up roughly 50% of your response times. This means that for every 400ms request, you’ll spend roughly 200ms waiting for PHP.

PHP7 is the latest version of PHP released in December 2015. For detailed performance benchmarks, have a look at Rasmus Lerforf’s presentation at PHP Australia. Here are his WordPress benchmarks from that presentation:

WordPress41-php7

 

Memcached or Redis

There is a general rule of thumb: disks are slow, memory is fast. It’s always best to use a caching system to speed up dynamic database-driven websites such as WordPress by caching data and objects in memory (RAM). This is why you should always use a caching system such as Memcached or Redis.

Memcached is used on many large websites including: Facebook, Reddit, Twitter, Tumblr, WordPress.com, Wikipedia, and Youtube.

Redis is another popular caching system similar to Memcached. It has many advantages over Memcached such as master-slave replication, sharding, and more.

 

MariaDB Galera Cluster

If you’re running an application that relies heavily on databases such as WordPress then it’s essential that your database servers are always available.

As an alternative to stock MySQL, I recommend using MariaDB. MariaDB is an easy-to-use MySQL drop-in that’ll work with all of your MySQL applications out of the box. MariaDB is often preferred over MySQL because of it’s greater performance, more advanced features, and backwards compatibility.

MariaDB Galera Cluster is a synchronous multi-master cluster for MariaDB. The obvious advantage of having a multi-master architecture is the reliability. Rather than having one read/write master and a read-only slave, you maintain two masters that integrate seamlessly. This means that even if one master goes down then you have a backup compared to the single-point of failure involving the single master and read-only slave.