Apache cgi handlers

15 June 2017 Author: Erik Lievaart In this article, I will be examining apache handlers and guide you through setting them up for cgi. Cgi scripts can be used to generate or modify responses to requests on the Apache webserver. Handlers can use these scripts to manipulate every request passing a certain rule. This is a flexible mechanism that turns a static webserver into a full blown application server. This guide aims to make setting up a cgi handler as simple and as painless as possible.

Theoretical Discussion

Feel free to skip to the next header if you are only interested in the hands on information in this tutorial. This chapter is purely theoretical an discusses what handlers are and gives a mental model of apache. Please note, I am by no means an expert on apache, nor do I pretend to be.

When web servers / web sites first came in to being, they only sent static text files on servers. When you type a URL into a browser, this translates to a path on that server and the file located at that path is sent back to the requester. With cgi (common gateway interface) it becomes possible to execute the file, rather than simply returning the contents. Our mental model of a web server as a file server still works though, there is just a little nuance that the file might be executed. This model is a great starting point for getting a rudimentary understanding of what a web server does.

For more advanced topics, however, this model falls short. We need a somewhat more complicated model to be able to understand. The full URL that is sent to the web server refers to a resource, rather than a file. It is up to the web server to translate that url into some response, usually a text based format, sometimes binary (e.g. an image). Effectively, a web server is a mapper that maps requests to (possibly dynamic) responses.

The most basic example of such a mapping is, as we've seen, returning a file existing at the same relative path. But apache may call a script to generate a response, ask another server application on the same machine (e.g. tomcat) to generate a response, or it may even send the request to another machine, possibly one running another apache instance. For the caller it seems as though all the content is hosted by this single apache instance. It looks as though it is calling a simple single server portal.

Like I said, Apache maps resources based on the URL (and possibly additional information sent with the request, such as cookies, headers, etc.). A handler then, is not simply a file that is executed for a URL, but it is a file that is executed for a range of urls. This makes it possible to manipulate html served without redefining the logic for every static file. For example, adding headers and footers, replacing / removing contents depending on security roles. It is also possible to call a handler for a path that is not backed by a file (for example to serve a page from the database). Being able to configure the distinction between the two makes sense (in the Apache configuration). Some scripts might require the backing file to work properly (such as the add footer example). Others require to be called regardless, because they can't fulfill their purpose otherwise (e.g. serving the file from the database). Apache does makes this possible and calls handlers that should be able to work without a backing file "virtual" handlers. Lastly, a handler might not modify the response at all. They can be used to satisfy non functional requirements, such as logging or auditing. Note: Apache has more specific modules for most things you would want to do, so check those first. The examples given in this section merely illustrate the possibilities of handlers.

Later in this article, I will give an example where a handler is mapped to a MIME type rather than to a URL, so what is a MIME type? When the browser sends a request to a server, it does not known in advance what format the response will have. For urls such as www.example.com/index.html it can be obvious, but this is not always the case. Urls ending in .php or .cgi could return HTML, binary (images) or plain text. Some urls do not have a file extension at all. The server generating the response informs the browser what kind of data it is sending by setting the Content-Type header in the response. This header contains a MIME type such as text/html, text/plain or image/jpeg. These MIME types are standardized and universally understood. If you have scripts that sometimes generate HTML, sometimes binary files then mapping handlers based on MIME types might make more sense. For more info on MIME types: https://en.wikipedia.org/wiki/MIME

Handler hello world

In this section we will set up a simple hello world example for a cgi handler. In my previous article, I explained how to install the cgi module on the apache webserver. This article is going to continue, presuming that the steps in the previous article have been completed successfully. When starting this article, the following url should show a simple web page: http://localhost/cgi-bin/hello.cgi Previous: Installing cgi

I will be using mod_actions for calling the scripts, to install this mod:

sudo a2enmod actions
Restart apache, to process the changes:
sudo service apache2 restart
The cgi handler https://httpd.apache.org/docs/2.4/handler.htmlwe define here consists of 2 parts:
  1. An action to invoke instead of simply serving the file located at the url.
  2. The rule which defines for which requests the action should be called.
Both parts are added at the end of the VirtualHost as before:
sudo vi /etc/apache2/sites-enabled/000-default.conf
The general format for defining an action:
Action [name] [exec] virtual?
When creating the rule, you will need the [name] to link the rule to the action. Under [exec] you will specify which cgi script needs to be executed for this action. The virtual keyword is optional and makes it possible to run handlers for URL's not backed by a file.

The following action is named hello-all and runs the hello.cgi file we created in the previous article.

Action hello-all /cgi-bin/hello.cgi
The following rule would invoke the hello-all action for all html pages:
AddHandler hello-all .html
Some alternative means exist for specifying when the handler should be invoked, please refer to the manual: https://httpd.apache.org/docs/2.4/mod/mod_actions.html#action

To process the changes:

sudo service apache2 force-reload
Try to reach the index: http://localhost/index.html

If all went well, then the output of the hello.cgi script will be returned. In other words, the index.html will NOT be rendered. The handler is responsible for returning the original file contents and I will show an example of this after some additional considerations.

The script will NOT be called for the url http://localhost/idonotexist.html This is because the idonotexist.html file is not present on the filesystem. To "fix" this problem:

Action hello-all /cgi-bin/hello.cgi virtual
After reloading Apache, the handler will be called for any url ending with .html regardless of whether it is backed by a file or not. Mind you, the index might still be available at http://localhost
Reason: the url does not end with the .html extension. One way to fix this, would be to apply the handler using a rule based on mime type, rather than on file extension:
Action text/html /cgi-bin/hello.cgi
You dont need to specify an AddHandler rule for this one, because the mapping is already in the action declaration.

Serving the original file

In this paragraph, I will create a simple handler, which serves the original page, while making a minor modification. First, lets create a new script:
sudo vi /var/www/cgi-bin/serve-file.cgi
with contents:
echo "Content-type: text/html"
echo ""
cat $PATH_TRANSLATED | sed "s/It works/It's broken/g"
Replace the configuration with:
Action text/html /cgi-bin/serve-file.cgi
Don't forget to:
sudo chmod +x /var/www/cgi-bin/serve-file.cgi
sudo service apache2 force-reload
And now you will see that the index page shows up again at http://localhost, but the red bar has been modified to say that it it is broken.

external filters

A short note on an alternative way of implementing aspect like behavior. It is also possible to run you contents through filters. I did not investigate this thoroughly and I will not go into the details, but I do want to mention this as a possibility:

You need to enable the ext_filter mod first:

sudo a2enmod ext_filter
To run all content in your VirtualHost through a simple script:
ExtFilterDefine test mode=output cmd=/var/www/cgi-bin/test.cgi
SetOutputFilter test
The output of the shell script will go to the browser and the original HTML (or other content) will be available on the system input of the script. The filter is applied on all requests, but my quick test shows that the filter is only called if there is a valid response. Apache actually discourages using external filters in production for performance reasons and recommends using native compiled filters instead. Refer to the original documentation for more information. Previous: Installing cgi Main Page