Using Hiawatha’s compression method, the “UseGZfile” option

Hiawatha webserver has an interesting approach to HTTP compression. In a typical scenario involving HTTP compression, the webserver serves up compressible static assets as GZip files, the contents of which are in turn extracted and displayed by supported browsers. This can often save a substantial amount of bandwidth and reduce page load times. Most webservers accomplish this by essentially piping all content through a GZip module or external binary, thus compressing all content which passes through it before serving it out. Where things get intriguing is that Hiawatha doesn’t do anything like this.

Hiawatha UseGZfile
You may be asking yourself, “if GZip encoding can save large amounts of bandwidth, and just about all other webservers support compressing content on the fly, why doesn’t Hiawatha?” The answer is that Hiawatha uses a technique which I find much more efficient and sensible… it simply requires a little more forethought to implement correctly.

EDIT: As of Hiawatha 10, this article is obsolete. While Hiawatha still utilizes static gzip caching of relevant objects, it now does this transparently without the need for direct user intervention. Cool, huh?

Drawbacks of on-the-fly GZip encoding

In the most common scheme where all assets are passed through a compression library before being sent to supported web clients, there is a tradeoff between bandwidth saved and CPU overhead. For a low-traffic site being hosted on relatively powerful server hardware, it may never become an issue. But for high volume sites, or on servers where resources might be constrained, HTTP compression can quickly become a substantial CPU bottleneck.

Apache mod_deflate CPU overhead graph

Apache mod_deflate CPU overhead graph – www.webperformance.com

The above being the case, it’s necessary to carefully examine whether the tradeoff of CPU resources for network bandwidth is worth making. If the content being served is primarily dynamic, compression might be too costly on an already busy CPU. Further, gzipping assets which don’t easily lend themselves to compression actually serves to harm performance on the client side. This is because the resulting compressed asset can be the same size or even larger than the original, and still needs to be uncompressed at the client which incurs extra latency, memory and CPU overhead.

Fortunately, none of these concessions are necessary with Hiawatha.

The Hiawatha approach

Hiawatha webserver doesn’t actually compress content at all. No deflate modules. No GZip pipes. No loops through 3rd party libraries. Yet, it does support GZip content encoding, and in quite a novel way. From the Hiawatha FAQ:

Hiawatha has no support for on-the-fly GZip content encoding, because there is no need to. Most websites consist of static files, like images and stylesheets, and dynamic content generated by CGI. Images are very hard to compress, so trying to compress them is a waste of CPU time. Stylesheets are often small (a few kilobytes), so there isn’t much to win by compressing them. Specially because they are often requested only once because of browser caching. Most Hiawatha users use PHP which has GZip output support, so that covers the important part of the dynamic content.

However, if a website does contain a large file that can be compressed, a jQuery javascript for example, you can use Hiawatha’s special GZip content encoding support. Set the ‘UseGZfile’ option to ‘yes’ and gzip the file. Hiawatha will upload gzipped version of the file. This is more efficient than compressing the same file over and over again. Make sure the original file is still present, in case of a browser that doesn’t support GZip content encoding.

Thus, if you enable UseGZfile in Hiawatha and create gzipped versions of your compressible static assets alongside the originals, Hiawatha can transparently serve them to clients which support gzip content encoding. Compress once, serve repeatedly, no runtime overhead. Elegant, no?

Utilizing the UseGZfile function

Hiawatha’s UseGZfile option can be set at the vhost level (i.e. the whole site), or more granularly at the directory level. Here’s the entry from the Hiawatha manual page:

UseGZfile = yes|no
If available, upload .gz with gzip content encoding instead of the requested file.
Default = no, example: UseGZfile = yes

If you’re sending GZip content, it’s also a very good idea to send the correct headers to the client, i.e. “Vary: Accept-Encoding,” which can help you avoid all kinds of potential problems. Especially if there are any intermediaries in-between (e.g. proxies, CDNs, corporate WAFs, etc.)

Hence, your vhost stanza might look something like this:

VirtualHost {
 Hostname www.example.tld, *.example.tld
 EnforceFirstHostname = yes
 WebsiteRoot = /srv/www/vhosts/www_example_tld
 SSLcertFile = /etc/ssl/my_certs/www_example_tld.pem
 UseToolkit = cache-control
 ShowIndex = no
 ExecuteCGI = no
 UseGZfile = yes
 PreventXSS = yes
 PreventCSRF = yes
 CustomHeader = Vary: Accept-Encoding
 CustomHeader = X-Frame-Options: sameorigin
 }

Or if you wanted to limit the GZip encoding to a particular directory:

Directory {
	Path = /srv/www/vhosts/www_example_tld/css
	UseGZfile = yes
}

You could also exclude GZip encoding from a directory:

Directory {
	Path = /srv/www/vhosts/www_example_tld/images
	UseGZfile = no
}

Which assets to compress

As a rule, anything which contains a lot of whitespace is usually worth compressing. Good examples are any plain text static assets larger than a few kilobytes, JS or large CSS files for instance. Some font bundles and graphics are also candidates, such as TTF, OTF, and SVG.

Here are a few real world examples of compression savings:

85K	ramnode-bootstrap.html
14K	ramnode-bootstrap.html.gz
248K	fontawesome-webfont.svg
72K	fontawesome-webfont.svg.gz
139K	fontawesome-webfont.ttf
82K	fontawesome-webfont.ttf.gz
94K	jquery.min.js
33K	jquery.min.js.gz
55K	style.css
12K	style.css.gz

Which assets NOT to compress

Some performance tools out there advocate compressing assets like progressive JPEGs. More often than not, this is the wrong thing to do. Progressive JPEGs can be viewed incrementally as they load, which is especially nice on low-bandwidth links. GZipping these images might save a few kilobytes, or it might not. But it will certainly prevent the image from being loaded progressively in browsers which support this standard. In other words, there is little to no gain from compressing already compressed images. All you’re really doing is increasing the perceived load time of a page from the user’s perspective. The best approach if your graphical assets can be compressed further with GZip in the first place is to better optimize them with your standard image editing tools. For example, images with lots of flat color are often best represented by indexed PNG format, and well-optimized progressive JPG’s using a modern DCT method can rarely be compressed any further.

Any CGI scripts should also be excluded from GZipping, as you almost never want to serve the raw source code of your site directly. Rather, they are rendered first by your CGI interpreter engine and then parsed by the end user as cooked HTML. Thus, while you may want to enable GZip compression within the CGI engine itself (this is almost free if the output will be cached anyway), the server-side scripting should be left alone.

Here are examples of file types you’ll almost never want to GZip:

*.gz/tgz    	Archives; already compressed
*.bz/bz2    	Archives; already compressed
*.7z        	Archives; already compressed
*.xz        	Archives; already compressed
*.zip       	Archives; already compressed
*.rar       	Archives; already compressed
*.deb       	Archives; already compressed
*.rpm       	Archives; already compressed
*.png       	Image files; optimize source
*.jpg/jpeg  	Image files; optimize source
*.gif       	Image files; optimize source
*.php       	Scripting language
*.py        	Scripting language
*.rb        	Scripting language
*.shtml     	Scripting language
*.cgi       	Scripting language
*.sh        	Scripting language
*.phar      	Scripting library
*.eot       	Fonts; already compressed
*.woff/woff2	Fonts; already compressed

Compressing assets

Before UseGZfile will actually be effective, you must first compress each of the individual assets you’d like to make available for GZip encoding. For instance:

gzip style.css -9 -c > style.css.gz

Note the “-9” flag in the command line. This tells gzip to compress as aggressively as possible to achieve the smallest resulting filesize. While this might not be the best approach when on-the-fly GZipping at the webserver level – it does, after all, require more time and CPU cycles than at lower compression levels – it’s perfectly acceptable for offline GZipping.

GZipping each individual file can be extremely tedious on extensive websites, therefore it’s a good idea to script the process. I’ve written a simple script for my own use which does exactly this. You’re free to utilize it yourself if it meets your needs, or use it as a starting point for your own scripts:

→ Web Content Encoder [source] [download]

content-encode-dir.sh usage examples

While my script is a bit primitive, e.g. it doesn’t test to make sure the compressed file is substantially smaller than the original, it does try to avoid GZipping uncompressible or nonsensical filetypes based on their extensions. It will however split up multiple files into a parallel workload based on your detected CPU core count, and run each parallel gzip operation at a reduced process priority so as not to negatively impact system performance (in case it is ever run on a live webserver). It can be targeted at one or more directories (recursively), individual files, or simply called without arguments. In the latter case, it will recurse from the present working directory, so be mindful of where you are situated on the filesystem before calling it. With all that out of the way, here are some usage examples.

Recursively compress the contents of a single directory:

content-encode-dir.sh /path/to/assets/static

Compress the contents of multiple directories recursively:

content-encode-dir.sh \
/path/to/assets/css \
/path/to/assets/js \
/path/to/assets/static

GZip a list of individual files:

content-encode-dir.sh \
/path/to/assets/css/style-web.css \
/path/to/assets/css/mobi.css \
/path/to/assets/css/crawler.css \
/path/to/assets/css/legacy.css

Compress recursively from the present working directory:

content-encode-dir.sh

Speaking of efficiency… If you happen to have an automated continuous integration process in place as part of your development cycle, that’s the best possible place to GZip all of your relevant assets. In that way, your assets are pre-compressed before even deploying them to your webheads.

Results

To demonstrate the effectiveness of this method, I’ve run some real-world performance benchmarks against this very blog. While I’ve taken great effort to make this site as lightweight and minimalist as possible, I was still able to substantially reduce the size of each page, and therefore serve more simultaneous requests with less latency.

Without GZip encoding

To get these results, I’d disabled UseGZfile within Hiawatha and turned off GZip encoding in my WP Super Cache plugin:

Page size: 114.3kb. Your website is faster than 98% of tested websites.
As you can see in the image above, the page size is already pretty light at about 114kB, and loads in just under 2/5 of a second. Anything under a second is considered by most to be pretty snappy.

6 requests, 114.3 kb, 393 ms
Note the index page at 22.6kB, and the rather chubby style.css tipping the scale at 55.1kB. I realize these are still tiny compared to many sites, where boilerplate stylesheets may contain lots of content that is never even referenced by the browser; I’ve seen a single CSS come in at well over 300kB. Still, can we make it faster?

With GZip encoding

Now we’ve got Hiawatha’s UseGZfile option turned on for my vhost, I’m sending the “Vary: Accept-Encoding” header, and I’ve setup my WP Super Cache plugin to gzip its output cache files as well:

Page size: 53.7kb. Your website is faster than 99% of tested websites.
Much better. The page size has been reduced by half, and as a result the site’s gone from top 2% fastest tested to the top 1%.

6 requests, 53.7 kb, 326 ms
You can see that, compared to the example above, the index page has been reduced by roughly 75% to only 5.5kB, and the CSS was compressed even more from 55.1kB down to only 11.6kB.

Just by GZipping, we’ve got the total page size down more than 50% over the original value to just under 54kB. Now I can effectively serve twice the amount of web clients in the same amount of bandwidth, and with no additional server-side CPU overhead. Excellent.

5 thoughts on “Using Hiawatha’s compression method, the “UseGZfile” option

    • Interesting, I’d never heard of Gatling. Too bad it was only written as a proof-of-concept. Indeed:

      “transparent content negotiation (will serve foo.html.gz if foo.html was asked for and browser indicates it understands deflate)”

      Hiawatha’s been doing that since about 2004 as well, so this isn’t a “new” feature per se. But in my opinion, offline GZipping is a much better approach than the alternative.

  1. Pingback: Hosting WordPress with Hiawatha | DotBalm.org

  2. Hi, do you have a mirror for your content-encode-dir.sh script? I’ve been trying to access it all day but the source and download links both aren’t working for me :(. Thank you!

    P.S. – You have helped me out SO much with setting up my own Hiawatha server!

Leave a Reply

Your email address will not be published. Required fields are marked *