How to Set Up an Apache Load Balancer

·

·

Apache HTTP Server is an incredibly feature-rich, fast, secure, and extensible web server. It’s been around since the dawn of the web, and is the most popular web server on Earth.
Did you know that Apache also makes for an excellent, feature-rich, fast, secure, and extensible load balancer? Well, it does, and I’m going to show you how to build a load balancer with Apache — you might be amazed at how easy it is to do.


Step 1. Install Apache HTTP server.

We mostly use Red Hat Linux variants (Red Hat Enterprise Linux, Oracle Linux, CentOS, AlmaLinux,  Rocky Linux, etc.), but we like other distros too! So we’ll focus on Red Hat-based distributions in this article, though these instructions can be adapted to other systems quite easily — Apache directives and  configuration parameters are the same across Linux and UNix distributions.

Installing Apache is as simple as running yum install httpd mod_ssl. In Ubuntu, or other Debian variants, Apache can be installed with apt install apache2
Set Apache to start at boot: systemctl enable httpd.
 

Step 2. Make a virtual hosts file for your load balancer.

<VirtualHost *:80> 
       ServerName domain.ca 
       ServerAlias www.domain.ca
        <Directory /> 
               AllowOverride none 
               Require all denied 
       </Directory>
SSLProxyEngine On
        SSLProxyVerify None
        SSLProxyCheckPeerCN Off
        SSLProxyCheckPeerName Off
        SSLProxyCheckPeerExpire Off
        # IMPORTANT: Never remove ProxyRequests, and never set it to 'on'!
        ProxyPreserveHost On
        ProxyRequests off
        ProxyVia Off
        # Load balancer configuration
        ProxyHCExpr ok234 {%{REQUEST_STATUS} =~ /^[234]/}
        ProxyHCExpr gdown {%{REQUEST_STATUS} =~ /^[5]/}
        <Proxy balancer://dynamic>
                BalancerMember http://10.0.1.2:8080
                BalancerMember http://10.0.1.3:8080
                ProxySet lbmethod=byrequests
        </Proxy>
        ProxyPass        / balancer://dynamic/
        ProxyPassReverse / balancer://dynamic/
        LogLevel warn 
       CustomLog /var/log/httpd/balancer-access.log combined 
       ErrorLog /var/log/httpd/balancer-error.log
</VirtualHost>

Above, is a simple example of an Apache load balancer virtual host. In production, more directives are likely to be present. Nevertheless, that’s all one really needs in order to have an Apache load balancer up and running! Pretty simple and quick, eh?

Step 3. Start Apache

In Red Hat land, that’s done like so: systemctl start httpd ; systemctl status httpd
Now test your web site in a browser. Watch the logs for the web servers in the balance pools. If the site loads as expected and the access logs show your requests going to all the machines without errors, everything is all set!


Configuration Explanation

Let’s break down those configuration parameters outlined above. We’ll stick to those related to the load balancer to keep this post focused on the subject at hand.
SSLProxyVerify None, SSLProxyCheckPeerCN, SSLProxyCheckPeerName, and SSLProxyCheckPeerExpire, all tell Apache to check the validity of TLS (SSL) certificates it forwards traffic to. These parameters are generally not required if you plan on using SSL certificates from a certificate authority, and on keeping them up to date. They are required if you plan on using self-signed TLS certificates.

ProxyPreserveHost “When enabled, this option will pass the Host: line from the incoming request to the proxied host, instead of the hostname specified in the ProxyPass line.” (Apache documentation). The directive ensures that applications see the host header as it was when the request was made by the client.

ProxyRequests Instructs Apache not to act as a forward proxy. A load balancer is a reverse proxy (in cases where it is a proxy at all). Setting this directive to ‘off’ is critical to ensure that no one can use your load balancer as a proxy to hide behind while they do horrible things. Make sure it is set to off!

ProxyVia Off Tells Apache to pass through requests without modifying the ‘Via’ header what so ever,which is generally what we want to do in the case of a load balancer. Other options include ‘On’, ‘Full’, and ‘Block’.

ProxyHCExpr ok234 {%{REQUEST_STATUS} =~ /^[234]/} is used for health checks. It defines what HTTP status code responses from the real / back-end servers are considered valid responses from a “healthy” node.

ProxyHCExpr gdown {%{REQUEST_STATUS} =~ /^[5]/} is also used for health checks. It defines HTTP status code responses that indicate a failed or “down” node.

<Proxy balancer://dynamic> defines a load balance pool. The string following ‘balancer://‘ can be anything you like. I suggest giving it a somewhat descriptive name to make understanding the configuration easy for everyone on your team, and for your future self.

BalancerMember http://10.0.1.2:8080 defines a back-end server and port the load balancer is to pass traffic to.

ProxySet lbmethod=byrequests sets the load balance algorithm or method. Other options include ‘bytraffic’, ‘bybusiness’, and ‘heartbeat’. See the documentation for more information.

ProxyPass / balancer://dynamic/ tells Apache to send all traffic for ‘/’ on this host to the load balance pool named, “dynamic”. The ProxyPassReverse directive ensures that traffic going back to the client from the back-end server gets to its destination intact.


Advanced Moves

One of the most popular benefits of using a load balancer, apart from horizontal scalability and high availability, is the routing of different requests to different parts of the infrastructure stack. So let’s talk about how to split static assets among destination nodes — one of the most common such use cases. Here, dynamic requests could be forwarded to web servers running Apache, listening on 8080 (as shown above) and requests for images, stylesheets, JavaScript files, and the like could be sent to static asset cache servers listening on port 6081 (as shown below). In this example, there is an assumption that the static assets to be cached are all in ‘/assets/’.

Splitting requests to an additional load balance pool in Apache is simple. In this example, we have added a second load balance pool, along with an additional set of ProxyPass and ProxyPassReverse directives.

<Proxy balancer://static>
BalancerMember http://10.0.1.4:6081
BalancerMember http://10.0.1.5:6081
ProxySet lbmethod=byrequests
</Proxy>
ProxyPass        "/assets" balancer://static/assets
ProxyPassReverse "/assets" balancer://static/assets


I want to draw your attention to the last string at the end of those ProxyPass and ProxyPassReverse directives. Without that, a request for https://example.ca/assets/bowie.jpg would be sent to a back-end server via the proxy to http://[back-end_server_ip]/bowie.jpg. If that back-end web server is expecting /assets/bowie.jpg — as most web server configurations would be configured — then it will duly return a 404 not found HTTP status code. So that’s why we add the ‘assets’ string to the end of the ProxyPass and ProxyPassReverse directives, which ensures that the request passed on to the real servers is exactly the same as the one submitted by the client browser. This is not a characteristic I have seen in any other load balancer. It’s a bit of a quirk in my view.

What if some POST requests to your application that include the /assets string in their URIs? You probably don’t want those to go through a static asset cache, even though they should be passed through without issue — that could potentially result in strange results in some edge cases, and why make those requests go through an extra “hop” anyway. To handle such a situation, use RewriteRule like so: 

RewriteCond %{REQUEST_METHOD} ^(POST|PUT|PATCH|DELETE|METHODS)$
RewriteRule ^/(.*)$ balancer://dynamic/$1 [P,L]


There you have it — a powerful, extensible, and fast load balancer with Apache! I hope you enjoyed reading this article, and perhaps learned a bit more about how cool Apache is. 🙂