Apache and mod_gzip

Home | Invoicing | Resume | Feedback | Guestbook | Personal Pages | Other Web Sites | Anything from Amazon!

Point to Point Wireless Bridging SSH Tunneling the Remote Desktop Connection Linux Anti-Spam Network PDF Printing Linux Tape Backups using a TAR script Apache and mod_gzip McAfee on Linux Auto-Update Script


Mod_gzip is a wonderful tool that almost every web server should be running.  What is cheaper in the long run, bandwidth or CPU cycles?  Definitely the CPU.  Most sit mostly idle all day long in the first place.  You can save upwards of 50% of your bandwidth using this simple plug in.  The mod_gzip module compresses eligible web pages on the fly before sending them to compatible browsers. 

Almost all HTTP 1.1 compatible web browsers support compression.  Every request that the browser sends to the server includes Accept-Encoding: gzip, compress or something similar.  This tells the server that it is save to compress the data before being sent.  If the client doesn't send the Accept-Encoding line, nothing is encrypted and the exchange of data takes place just like it always has.

Mod_gzip will attempt to use pre-compressed data when possible.  You can control how this is done through settings in the httpd.conf file.

Installation

  • Download mod_gzip.so from http://www.remotecommunications.com/apache/mod_gzip/  (it looks like their site is down, so check out SourceForge at http://sourceforge.net/projects/mod-gzip/ )

  • Install in your apache lib folder, such as /usr/lib/apache

  • Edit your httpd.conf file

  • Restart Apache

  • Please refer to the HSC website where you download from for the latest data on how to install.  Below is shown what I added to my httpd.conf file:

    From the loadmodules section:

    LoadModule userdir_module modules/mod_userdir.so
    LoadModule alias_module modules/mod_alias.so
    LoadModule gzip_module modules/mod_gzip.so
    LoadModule rewrite_module modules/mod_rewrite.so
    LoadModule access_module modules/mod_access.so

    In the AddModule section

    AddModule mod_userdir.c
    AddModule mod_alias.c
    AddModule mod_gzip.c
    AddModule mod_rewrite.c
    AddModule mod_access.c

    And finally in its own section

    <IfModule mod_gzip.c>
    mod_gzip_on yes
    mod_gzip_dechunk yes
    mod_gzip_keep_workfiles No
    mod_gzip_temp_dir /tmp
    mod_gzip_minimum_file_size 1002
    mod_gzip_maximum_file_size 1000000
    mod_gzip_maximum_inmem_size 1000000
    mod_gzip_item_include file \.htm$
    mod_gzip_item_include file \.html$
    mod_gzip_item_include file \.php$
    mod_gzip_item_include mime ^text/.*
    mod_gzip_item_include mime ^application/x-httpd-php
    mod_gzip_item_include mime ^httpd/unix-directory$
    mod_gzip_item_exclude file "\.css$"
    mod_gzip_item_exclude file "\.js$"
    mod_gzip_item_exclude file "\.wml$"
    mod_gzip_item_exclude reqheader "User-Agent: .*Mozilla/4\..*\["
    mod_gzip_item_exclude mime ^image/.gif
    </IfModule>

    A few comments on what some of the various lines are for:

    The mod_gzip_item_include lines specifically control what files are compressed.  You may have reason to control just what file types you compress, and this will let you override compressing everything.  This is a good idea, more for the mod_gzip_item_exclude lines.  Some browsers have a problem with compressed style sheet and javascript library files.  Others have a hard time with images.  I've found I got better results just excluding Netscape 4.x series browsers, which from the newsgroups and mailing lists are the main cause of the problem.  That's handled in the "User-Agent: .*Mozilla/4\..*\[" line.  IE does seem to sometimes have an issue with .gif images, so I'm temporarily excluding that until I can confirm the issue.  I haven't heard of any other browsers having issues.  And while Netscape 4.x only represents about 2% of my target audience, it is enough.

    One final addition.  I use AWStats for all my web reporting.  Not only is it free, it produces some of the best reports I've seen.  This and other reporting programs will support compression statistics.  For that, you must define a custom log file type.  This is included below, and I add it under the other LogFormat lines in the httpd.conf file.  Note that this is all one long line:

    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" mod_gzip: %{mod_gzip_result}n In:%{mod_gzip_input_size}n Out:%{mod_gzip_output_size}n:%{mod_gzip_compression_ratio}npct." combined-gzip

    Now, tell Apache to use this as your new log file format.  I recommend that you stop Apache, rename your current log file, and start fresh, since the formatting will now change.  Modify the lines found below for both the main log and any virtual hosts to make sure it uses the combined-gzip format you just defined:

    CustomLog /var/log/httpd/access.log combined-gzip

    Now save and restart Apache.  Please note that in your message log file, you may get the following error, depending on your version, options, and distribution of Apache:

    [warn] Loaded DSO modules/mod_gzip.so uses plain Apache 1.3 API, this module might crash under EAPI! (please recompile it with -DEAPI)

    You can safely ignore this message.  Really.  It simply means that this file was not compiled with SSL support.  Since it doesn't work with SSL anyways, no major loss.

    Results

    In just 8 hours of running mod_gzip, here are the results direct from AWStats on one website hosted here:

    Files type - Web compression  
    Files type  Hits Percent Bandwith Compression on Compression result Bandwith saved
    gif 45948 83.6 % 68.91 MB 1.36 MB 1.34 MB 27.16 KB (1%)
    jpg 6619 12 % 45.45 MB 4.19 MB 4.05 MB 146.23 KB (3%)
    htm 1988 3.6 % 36.03 MB 3.75 MB 613.57 KB 3.15 MB (84%)
    Unknown 135 0.2 % 991.42 KB 4.61 KB 2.50 KB 2.11 KB (45%)
    swf 111 0.2 % 21.62 MB    
    png 83 0.1 % 14.09 KB    
    pl 23 0 % 957.70 KB    

    As you can see, I have saved several megs of transfer.  And this is on a relatively low volume site.  But 84% compression on HTML pages is very impressive.  And definitely worth looking into.  And its easy, and has no real downside unless you're more worried about CPU cycles than bandwidth.  And if that's the case, you really need to get more hardware.