REBOL
Docs Blog Get-it

Creating and Processing Web Forms with CGI (Tutorial)

By Carl Sassenrath
Revised: 12-Mar-2024
Original: 20-Jan-2005

Contents

Introduction
A Simple Web Form
The HTML Code
Testing the Web Form
Processing a Web Form
Seeing All the CGI Data
Showing CGI Form Results
Decoding Web Form Data
Example test.cgi Script
GET and POST Methods
The GET Method
The POST Method
A Handy CGI Read Function
An Improved test.cgi Script
An Expanded Example
The HTML Code
Testing The Form
Helpful Hints
Debugging
Testing Scripts Locally First
Accessing Form Data as an Object
Checking Web Form Values
Trimming Fields
Saving Web Form Data
Saving Web Data to a Unique File
Using REBOL Forms with CGI
CGI Security Issues
Beware of Form Data
Accessing Server Files
Public Directories and Files

Introduction

This article describes how to create and process simple web input forms. It is written as a tutorial for CGI users and webmasters who are new to REBOL.

The fundamental HTML and CGI techniques provided in this article are easy to understand, and you can experiment with the examples by using just a text editor, web browser, web server, and REBOL/Core. No other special tools are required.

If you are new to CGI processing in REBOL, I suggest that you review the previous article Quick and Easy CGI - A Beginner's Tutorial and Guide. It explains the basic concepts of REBOL CGI. They are not repeated here.

The examples shown in this article have been kept simple to make them easy to learn. More elaborate web forms, such as multiple-page forms, will be covered in separate articles.

The CGI examples used here take advantage of the fixes and improvements that are found in REBOL/Core 2.6.0 or better. If you want the examples shown here to work properly, please upgrade to a more recent version of REBOL.

Comments or corrections to this article should be sent to our feedback page.

A Simple Web Form

Web forms are easy to create using a web authoring tool or by typing the necessary HTML tags directly into a text editor and displaying them using a web browser.

The HTML Code

Here is the HTML code for creating a very simple form that includes a single text entry field and a submit button:

<HTML>
<TITLE>Simple Web Form</TITLE>
<BODY>
<b>Simple Web Form</b><p>
<FORM ACTION="http://data.rebol.com/cgi-bin/test-cgi.cgi">
<INPUT TYPE="TEXT" NAME="Field" SIZE="25"><BR>
<INPUT TYPE="SUBMIT" NAME="Submit" VALUE="Submit">
</FORM>
</BODY>
</HTML>

Save this HTML code to a file called simple.html, then preview it in your web browser by clicking on its file icon (in Windows) or by opening it as a file within your browser. It should look like this:

Simple Web Form


If your web page does not look like this, check your HTML for errors. The most common mistakes for beginners are:

If you compare your file to the HTML shown above, you should be able find your mistakes.

Testing the Web Form

The HTML FORM tag above specifies how the web form is to be processed. The line:

<FORM ACTION="http://data.rebol.com/cgi-bin/test-cgi.cgi">

contains the URL for a simple REBOL CGI script that runs on one of REBOL's servers. It displays a web page that shows you the values of the input fields submitted in the form.

As a test, enter some text into the form and press the Submit button. You should see a response from the www.rebol.com server that looks like this:

CGI Form Data:

Submitted: 12-Apr-2003/12:53:01-7:00
REBOL Version: 2.5.5.4.2

Field"Testing"
Submit"Submit"

The test.cgi URL is handy for testing any of the web forms created in the sections that follow.

Processing a Web Form

In the first REBOL CGI article you saw how easy it was to use REBOL for CGI scripting. This section expands on that to show how scripts are used to process web forms.

Seeing All the CGI Data

When the web server runs your CGI script, it passes information to the script within a standard CGI environment. By providing the -c (or --CGI) option to REBOL, you are asking REBOL to get the CGI environment information.

Here is a short REBOL CGI script that is useful to remember:

#!rebol -cs
REBOL []
print "Content-type: text/html^/"
print [<HTML><BODY><PRE> mold system/options/cgi </HTML>]

When called by a web server, this script will print all the CGI information that is passed from the web server to REBOL. It gets the information from REBOL's system/options/cgi object.

To see it work, change the FORM tag in the above HTML to this line:

<FORM ACTION="http://data.rebol.com/cgi-bin/probe.cgi">

Now save the file and view it in your HTML browser. Type some text and hit the submit button. You will see a web page that looks like this:

make object! [
    server-software: {Apache/1.3.42 (Unix) PHP/4.4.9 mod_...}
    server-name: "www.rebol.com"
    gateway-interface: "CGI/1.1"
    server-protocol: "HTTP/1.1"
    server-port: "80"
    request-method: "GET"
    path-info: none
    path-translated: none
    script-name: "/cgi-bin/probe.cgi"
    query-string: ""
    remote-host: none
    remote-addr: "123.1.2.3"
    auth-type: none
    remote-user: none
    remote-ident: none
    Content-Type: none
    content-length: none
    other-headers: ["HTTP_ACCEPT" {application/xml,application/xhtml+xml,...}]
]

What you are seeing here is REBOL's system/options/cgi object. This object stores all of the information that was passed from the web server to REBOL, and you can access it from the CGI scripts that you write. As you can see, the CGI object includes information about the web server name and version, the CGI request method, the CGI script path, the data that was submitted (query-string), the browser's IP address (remote-addr), and much more.

In the script above and the ones to follow, you will need to change the first line to provide the correct path to your copy of the REBOL/Core program. In other words, everywhere we show:

#!rebol -cs

you will need to provide the correct path for your web server, such as:

#!/home/myaccount/rebol -cs

See the A Beginner's Tutorial and Guide for more information.

Showing CGI Form Results

The CGI object contains a lot of information. Some of this may be useful to your script, but what you want most is to get the results that were submitted in the web form (as shown earlier).

In the example web form above, the data can be found in the QUERY-STRING field of the CGI object. You can easily revise the above script to show it:

#!rebol -cs
REBOL []
print "Content-type: text/html^/"
print [
    <HTML><BODY><PRE>
    mold system/options/cgi/query-string
    </HTML>
]

Decoding Web Form Data

When you see the results of the script above, you will notice that the web form data is specially encoded to allow it to be passed back to the CGI script.

REBOL provides the DECODE-CGI function to make it easy to decode CGI form data. (I will not describe the CGI encoding method here, but if you are interested you can find information about it here.)

The DECODE-CGI function converts the raw form data into a REBOL block that contains words followed by their values. For example the CGI data:

name=Fred&class=101&math=2+%2B+2+%3D+4&

would get converted by DECODE-CGI to:

[name: "Fred" class: "101" math: "2 + 2 = 4"]

You can try it yourself by running REBOL and calling the DECODE-CGI function directly:

>> decode-cgi {name=Fred&class=101&math=2+%2B+2+%3D+4&}
== [name: "Fred" class: "101" math: "2 + 2 = 4"]

In newer versions of REBOL (Core 2.5.5 and better), web input field names that occur more than once will be combined into a block value. For example:

name=Fred&status=good&status=happy&

would return as:

[name: "Fred" status: ["good" "happy"]]

This is useful for items like checkboxes, where more than one result may be selected.

Note that the block returned by DECODE-CGI can be used to create a REBOL object that makes it easier to access the web form results. This will be described in more detail below.

When you use the DECODE-CGI function, you must be sure that the input field names are valid REBOL words. It is safe to use just letters (alphabet) or letters followed by numbers, but do not start a word with a numeric digit (0-9) and avoid using special punctuation (other than a dash "-") within a word.

Right:

<INPUT TYPE="TEXT" NAME="oneway">
<INPUT TYPE="TEXT" NAME="one-way">

Wrong:

<INPUT TYPE="TEXT" NAME="1way">
<INPUT TYPE="TEXT" NAME="1-way">
<INPUT TYPE="TEXT" NAME="one;way">

Example test.cgi Script

Here is a REBOL CGI script that processes the form you submitted above and displays the decoded results within a table:

#!rebol -cs
REBOL []
print "Content-type: text/html^/"

html: make string! 2000
emit: func [data] [repend html data]

emit [
    <HTML><BODY BGCOLOR="#FFC080">
    <b> "CGI Form Data:" </b><p>
    "Submitted: " now <BR>
    "REBOL Version: " system/version <P>
    <TABLE BORDER="1" CELLSPACING="0" CELLPADDING="5">
]

foreach [var value] decode-cgi system/options/cgi/query-string [
    emit [<TR><TD> mold var </TD><TD> mold value </TD></TR>]
]

emit [</TABLE></BODY></HTML>]
print html

Note that most of this script is used to create the HTML to be displayed by your web browser. Here is a summary of the script:

If you use this CGI script with the HTML form created above, you should get this result:

CGI Form Data:

Submitted: 12-Apr-2003/12:53:01-7:00
REBOL Version: 2.5.5.4.2

Field"Testing"
Submit"Submit"

If you don't see something like this result, double check your script for errors.

GET and POST Methods

The GET Method

It is important to know that the web form examples shown above use the HTTP GET request method to submit data to the web server.

The GET method sends results to the web server by encoding them as part of the URL. You may have noticed these long URLs when you use a web browser to visit various web sites, such as search engines. For example:

http://www.google.com/search?as_q=&num=50&hl=en&btnG=Google+Search&...

The data is then read by your program from the query-string field of the CGI object as shown earlier.

The GET method works fine for sending small amounts of data, but the length of URLs is limited and long URLs will be truncated. The solution to this problem is the subject of the next section.

The POST Method

A better approach for larger form submissions is the HTTP POST request method. This method directs the data to the standard input port of the CGI script, allowing your CGI script to read it directly.

The format of the data sent with the POST method is the same as that sent with the GET method. You will need to use DECODE-CGI to decode your web form data (as shown earlier).

To use the post method, you must add a POST attribute to the FORM tag that appears in your HTML file. For example:

<FORM ACTION="http://data.rebol.com/cgi-bin/test-cgi.cgi"
METHOD="post">

This tells the web server to redirect the query string to the CGI script's input port, where it can be read.

To read the data from standard input port, your CGI script will need a READ-IO loop such as this:

data: make string! 1020
buffer: make string! 16380
while [positive? read-io system/ports/input buffer 16380][
    append data buffer
    clear buffer
]

This reads the web data from the input port and appends each chunk of data to a DATA string. This string can then be used in the same way as the query-string shown above, calling the DECODE-CGI function to decode the input data.

The input port for CGI scripts is opened in text mode and line terminators will be automatically converted to the standard newline format used by REBOL (the LF character).

If you need to read binary data from the input port, you must change the input port mode by calling the SET-MODES function:

set-modes system/ports/input [binary: true]

Note that this is not normally required for CGI input, because binary characters are sent as encoded hexadecimal text which is decoded by the DECODE-CGI function.

A Handy CGI Read Function

Rather than worry about how data is submitted (GET or POST methods), the function below handles both methods within your script:

read-cgi: func [
    ;Read CGI data. Return data as string or NONE.
    /local data buffer
][
    switch system/options/cgi/request-method [
        "POST" [
            data: make string! 1020
            buffer: make string! 16380
            while [positive? read-io system/ports/input buffer 16380][
                append data buffer
                clear buffer
            ]
        ]
        "GET" [data: system/options/cgi/query-string]
    ]
    data
]

Your script can call READ-CGI to get the query string, regardless of the method used to send it. All of the examples that follow will include this function.

An Improved test.cgi Script

The READ-CGI function defined above can be added to the test.cgi script to make it work for any length of web form submission.

#!rebol -cs
REBOL []
print "Content-type: text/html^/"

html: make string! 2000
emit: func [data] [repend html data]

read-cgi: func [
    ;Read CGI data. Return data as string or NONE.
    /local data buffer
][
    switch system/options/cgi/request-method [
        "POST" [
            data: make string! 1020
            buffer: make string! 16380
            while [positive? read-io system/ports/input buffer 16380][
                append data buffer
                clear buffer
            ]
        ]
        "GET" [data: system/options/cgi/query-string]
    ]
    data
]

emit [
    <HTML><BODY BGCOLOR="#FFC080">
    <b> "CGI Form Data:" </b><p>
    "Submitted: " now <BR>
    "REBOL Version: " system/version <P>
    <TABLE BORDER="1" CELLSPACING="0" CELLPADDING="5">
]

foreach [var value] decode-cgi read-cgi [
    emit [<TR><TD> mold var </TD><TD> mold value </TD></TR>]
]

emit [</TABLE></BODY></HTML>]
print html

An Expanded Example

Here is an expanded web form example that is more typical of what you might need on a web site.

The HTML Code

This HTML form includes text fields, radio buttons, checkboxes, a dropdown menu, a text area, and a submit button.

Full name?

Email address?

Send free widget? Yes No Maybe

Status? Student Consultant Programmer

Notes:

The HTML code for this form is:

<FORM ACTION="http://data.rebol.com/cgi-bin/test-cgi.cgi"
METHOD="POST">
Full name? <BR>
<INPUT TYPE="TEXT" NAME="name" SIZE="40">
<P>
Email address? <BR>
<INPUT TYPE="TEXT" NAME="email" SIZE="40">
<P>
Send free widget? 
<INPUT TYPE="RADIO" NAME="send" VALUE="yes"> Yes
<INPUT TYPE="RADIO" NAME="send" VALUE="no"> No
<INPUT TYPE="RADIO" NAME="send" VALUE="maybe"> Maybe
<P>
Status?
<INPUT TYPE="CHECKBOX" NAME="status" VALUE="student"> Student
<INPUT TYPE="CHECKBOX" NAME="status" VALUE="consult"> Consultant
<INPUT TYPE="CHECKBOX" NAME="status" VALUE="prog"> Programmer
<P>
<SELECT NAME="when">
<OPTION SELECTED>Send it Now</OPTION>
<OPTION>Send it later</OPTION>
<OPTION>Send it next month</OPTION>
<OPTION>Send it next year</OPTION>
</SELECT>
<P>
Notes: <BR>
<TEXTAREA NAME="description" ROWS="8" COLS="45"></TEXTAREA>
<P>
<INPUT TYPE="SUBMIT" NAME="submit" VALUE="Submit">
</FORM>

Testing The Form

The form above can be tested in the same way as the earlier examples by sending it to the test.cgi script. Note that the POST method is used to allow the input fields to be of any length.

Here is an example response from the test.cgi script:

CGI Form Data:

Submitted: 12-Apr-2003/16:20:42-7:00
REBOL Version: 2.5.5.4.2

name"Luke Lakeswimmer"
email"luke@rebol.com"
send"yes"
status["student" "prog"]
when"Send it next month"
description"I use REBOL for all my programming."
submit"Submit"

Note that the status result was returned as a block that contains two string values. This indicates that two STATUS checkboxes were selected.

If you try this example, but see the word STATUS appear more than once in the left column, then you are not using REBOL/Core 2.5.5 (or better) version of REBOL. That is not a big problem, but may be a problem if you convert the DECODE-CGI block into an object (to be shown below).

Helpful Hints

Here are a few techniques that can come in handy when processing CGI forms in REBOL.

Debugging

Because CGI scripts run on the web server and must be uploaded each time (or edited remotely using a shell), it can often take a lot of time to work out the bugs.

One helpful technique is to add a line of code to your CGI script that prints out the query string that was sent to it from the web browser. That way you know for sure what the input data is when script begins to process it.

If you are using the READ-CGI function suggested above, you can simply add a probe to its last line:

probe data

Now, when you run the script, you'll see the "raw" CGI data at the top of your resulting web page. (Note that some web browsers may not show it to you unless you view the HTML source to the page.)

Testing Scripts Locally First

You can also save a lot of time by testing your scripts locally before uploading them to your web server.

To provide test data to your script, you can set the CGI object fields directly within your script. For example, the code below will detect that your script is running locally and supply some test data:

if none? system/options/cgi/request-method [
    system/options/cgi/request-method: "GET"
    system/options/cgi/query-string: "your-string"
]

However, the method that I prefer is to modify the READ-CGI function to provide a default return value if the request-method was not found (which means the script is running locally, not on the server). This is done by adding the test input data to the last line of the READ-CGI function:

any [data "my-test-data-here"]

If the DATA variable is none, then my test data is used.

Note: You can use the PROBE debugging method shown earlier to create a CGI encoded string. Just cut and paste the output as your input for the local test.

Accessing Form Data as an Object

For larger forms that have a lot of fields, you can make it easier to refer to field values by converting the DECODE-CGI result block to an object. This can be done with the CONSTRUCT function (which unlike MAKE or CONTEXT does not evaluate the contents of the object):

request: construct decode-cgi read-cgi

The result is an object, and you can access its data fields directly. For example:

request/name
request/email
request/status
...

If a field has multiple return values for a single name (as commonly done for checkbox fields), the result may be a string or a block. You can use the BLOCK? function to detect multiple values. For example:

if block? request/status [
    ...multiple checkboxes clicked...
]

See the DECODE-CGI section for more information.

If you want to be sure that the object always has all its required fields (because that may depend on the fields the user submitted in the web form), you can specify a template when you use CONSTRUCT:

template: context [command: name: email: status: none]
request: construct/with decode-cgi read-cgi template

Now your script can check object fields to determine if they have been set before accessing them:

if request/command [
    switch request/command [...]
]

If you did not provide such an object template, and the command field was not provided in the submitted data, your CGI script would error out (quit).

Checking Web Form Values

Probably the most difficult part of writing most CGI scripts is checking the web form fields to make sure that they are valid before processing. For example, you might require a valid email address, integer number, URL, etc.

One way to check fields is to use the PARSE function. For example, if a field must contain only numeric digits, you can use code such as:

digits: charset "0-9"
if not parse/all value [some digits] [
    print "Error in number" quit
]

(Note that PARSE/ALL is needed if you want to detect the whitespace in the string.)

You can easily test your PARSE code using the REBOL console:

>> digits: charset "0123456789"
>> parse/all "123" [some digits]
== true
>> parse/all "123x" [some digits]
== false

Another way to check results for fixed values, like those returned from radio buttons or text fields, is to use the FIND function:

if not find [
    "send it now"
    "send it later"
    "send it next month"
    "send it next year"
] request/when [
    print "Error in value" quit
]

For more complex types of values, you can also use REBOL's built-in loader. Here is a check for a valid integer:

value: load/all request/age
if not integer? value/1 [
    print "Invalid age"
]

Here is a check for a valid REBOL email address:

value: load/all request/age
if not email? value/1 [
    print "Invalid email"
]

Note that if you are using REBOL 2.5.5 or better, you can safely use LOAD rather than LOAD/ALL because script headers are not evaluated in the newer versions. If you do, you can remove the /1 index from the above code.

Trimming Fields

It is a common practice to trim leading and trailing whitespace from text input fields. This is more of a convenience to your web form users.

An example of trimming a field in REBOL code would be:

trim/head/tail request/email

Note that TRIM modifies its input string value (so no assignment of the result is needed. Add a COPY if you don't want this side effect).

Saving Web Form Data

Depending on your requirements, there are many ways to save your web form data within your CGI script.

The simplest method is to use the SAVE function and write out the REBOL object that was created in the earlier example:

save %cgi-data request

Remember to use the -cs command line if you use REBOL to write files. See the A Beginner's Tutorial and Guide.

If you want to keep a history of requests, you could append the request block returned from DECODE-CGI to a file:

data: decode-cgi read-cgi
write/append %cgi-log append mold data newline

This example appends a newline to put each request on a separate line within the file.

You can also add fields like the date, requesting computer's IP address, and anything else you need:

write/append %cgi-log reform [
    now/date
    system/options/cgi/remote-addr
    mold data
    newline
]

Saving Web Data to a Unique File

In CGI scripts, creating a unique file name to store each CGI request is more difficult than it first seems.

The problem comes from the fact that a single CGI script may be run multiple times by the web server at the same time.

If you keep a "counter" in a file to use for generating file names, you create a "race" condition. Two scripts that are running at exactly the same time might read the same counter value before it can be changed.

Another method is to use the current date and time to create a file name. This does not work either, because you have the same race condition. Two scripts may run at exactly the same time and use the same date value.

It turns out, this is not a trivial problem to solve. You have two choices:

  1. If it is acceptable that in some very rare cases data might be lost, then you can use one of the above methods. You can reduce the chances of data loss by using functions like now/precise, but you can not totally eliminate the possibility. This method may work fine for you, because for most small web sites, you'll never lose data. It is also the easiest to implement.
  2. If you absolutely cannot lose data, then you must add a locking mechanism to your CGI script that prevents the race condition. This can be done in a few ways, and it is the subject of a separate REBOL CGI article.

Using REBOL Forms with CGI

CGI also works well for forms that are created in REBOL. In fact, by using REBOL's graphical user interface you can create forms that are more dynamic and can be validated before sending them to the server (resulting in better performance and less traffic on your server and network).

Unfortunately, this web form article is already too long, so I will describe REBOL-based CGI forms in a separate article (to be published soon, I hope).

CGI Security Issues

When processing forms with CGI, you should be aware of potential security problems.

Beware of Form Data

Watch out for data submitted with web forms. Never assume that the data is what you think it is. For example, a name field may not be a name at all, but just random data or data that is intentionally trying to break your CGI script or hack into your server.

Here are some important points that will help you to avoid problems:

Here is example code that limits a string to 100 characters:

clear skip string 100

To limit CGI post data, you can modify the READ-CGI function to abort if input exceeds a specified length:

read-cgi: func [
    ;Read CGI data. Return data as string or NONE.
    /local data buffer
][
    switch system/options/cgi/request-method [
        "POST" [
            data: make string! 1020
            buffer: make string! 16380
            while [positive? read-io system/ports/input buffer 16380][
                append data buffer
                clear buffer
                if (length? data) > 10000 [print "aborted" quit]
            ]
        ]
        "GET" [data: system/options/cgi/query-string]
    ]
    data
]

Note that you don't need to limit GET data because servers do that by default.

Accessing Server Files

In order to save data from your forms, your CGI script probably has read-write access to one or more file directories on your server. You need to be careful. You do not want to accidentally give a web user the ability to read other data files or, even worse, to write to any file.

As an extreme case imagine what would happen if a web user, by finding a flaw in your CGI script, could write to an executable file in your CGI directory. The next time that file is executed, a serious problem could result.

The best way to avoid such problems is to never allow the data from a web form to provide the file name that is used to access files on the server, unless you check the file path carefully.

If you want additional security, you can use the REBOL SECURE function to restrict file access to one or more specific directories. For example, you can add a few lines to the top of your script (before it accesses files):

secure [
    file quit
    %cgi-data/ [quit all allow write]
]

If for any reason your CGI script attempts to read or write files outside the "cgi-data" directory or tries to read any files in the "cgi-data" directory, the script will automatically terminate.

You can also expand on this if you need to read special data files in your script:

secure [
    file quit
    %zip-codes.r [quit all allow read]
    %prices.r [quit all allow read]
    %cgi-data/ [quit all allow write]
]

Be sure to specify the "s" option in your REBOL CGI command path, or calling the SECURE function will abort your script. For example, as shown earlier, use a command line such as:

#!/home/myaccount/rebol -cs

This tells REBOL to run in CGI mode with security lowered (allowing the SECURE function to change the settings).

Public Directories and Files

If you are sharing your web server with other users (such as on a web hosting service), be aware that CGI directories, scripts, data, and other files may be accessible to other users of the system, depending on your host server's configuration.

If this possibility concerns you, check with your server administrator or web service provider for information on how to properly protect your CGI scripts and data.

About | Contact | PrivacyREBOL Technologies 2024