PG2CGI Access to a Database from the Web

ArticleCategory: [Choose a category for your article]

Webdesign

AuthorImage:[Here we need a little image from you]

[Photo of the Author]

TranslationInfo:[Author and translation history]

original in es Luis Colorado

es to en Miguel A Sepulveda

AboutTheAuthor:[A small biography about the author]

Luis Colorado works as administrator of UNIX systems and Internet access for Telefónica Sistemas S.A. in Spain. He graduated in Physics by the Universidad Complutense of Madrid and has developed several open source UNIX utilities.

Abstract:[Here you write a little summary]

We describe a new program that allows one accessing a database from HTML pages. The design goals of the program were to control data access, flexibility in the configuration and independence from layout formats. This program resulted from a number of ideas emerge during my readings of several articles about M4 that appeared recently in the magazines Linux Journal and LinuxFocus during the last 12 months.

ArticleIllustration:[This is the title picture for your article]

[Ilustration]

ArticleBody:[The article body]

Introduction

This article describes the rationale and ideas that lead me to develop a package for accesing a database from the web. It is not my intention to write a manual on the PG2CGI package (a reference guide is already included in the distribution of the software, the URL for this package is given later), rather my goal is to give a brief introduction and to hopefully stimulate readers to use it and send their feedback.

Motivation

I wrote this program as a response to a series of articles that appeared in the Linux Journal and in LinuxFocus concerning the use of M4 to manage HTML sources. These articles demonstrated the advantages and potential of M4, as a standalone tool for the maintenance of webpages and the dynamic generation of content.

On the other hand the availability of numerous web servers and databases contrasts with the lack of utilites that can interface between both environments (most of these interface applications are commercial or have serious deficiencies with regards to the formats that can be handled).

The following software package unifies the database and web environments. It was designed to satisfy these requirements:

The utility presented here satisfies these requirements at the expense of a small penalty in efficiency (it has to invoke M4 several times in the course of execution) but the results are satisfactory in most cases (take into account that in most cases the database query already takes more time than the dynamic generation of HTML text)

M4

M4 is a macro processing tool developed a long time ago. Our software makes heavy use of this macro procesor:

M4 was originally developed to link the CGI (Common Gateway Interface) with the PostgreSQL database. Soon after people realized it was general enough for other interfaces besides CGI (for example a gateway between a LDAP server and other databases like Informix and Oracle).

Later the interface was standarized incorporating all the necessary functionality and the PostgreSQL module was rewritten according to the new interface. Now it is possible to develop new drivers that allow the program to link with other database managers.

The repeated used of the macro processor can mean a lost of efficiency, although our tests with the GNU M4 utility have been quite satisfactory.

Regular Expressions:

Our utility makes heavy use of regular expressions to verify the configuration rules to be applied. Regular expressions are prefered over simple comparisons because they give us better functionality. Regular expressions have among other advantages:

PG2CGI makes use of regular expressions when verifying the configuration rule to be used.

The syntax of expressions and parsing the data for the drivers passed in the regular expressions can be easily done thanks to grouping multiple sub-chains in a single regular expression.

An example: let us supposed the client has to passed some informartion in the query_string, and that such data must fit to the following syntax:

  FIELD=value
   

Furthermore the query_string has to satisfy this format and there should not be any additional information in the string.

Enforcing this syntax can be easily accomplised by writing the following term in the selection rule:

  QUERY_STRING: "^FIELD=[^&]*$";
     

The previous line warrants that the rule will be invoked only if the QUERY_STRING fits the proper syntax. No only that, but if we write parenthesis around the value in the expression above then the program will allow to parse the corresponding value:

  QUERY_STRING: "^FIELD=([^&]*)$";
   

In the process the program also transforms the escape sequences of the form %xx introduced by the navigator

How Does it Work?

This is the moment to describe how the program works. When it is first launched it receives variables from the web server environment and it then configures itself according to these variables. The environment variables are all that the program has to findout: How about the client? Where is the origen of the request? What is the request? What type of information (MIME type) is supported by the client? etc. PG2CGI then selects the rule to be used based on the left-hand side of the rule. At the present time there are three types of rules:

With all this machinery we can start building rules. Next we will group a number of terms that specify a valid left-hand side for a rule. The terms are place between "{}" brackets.

The left and right hand side of a rule are delimited using curly brackets "{}" and are separated by the symbol `->'.

The right-hand side contain terms with the same syntax: a variable name, the character `:', a string of characters and the terminator `;'. All the terms on the right-hand side are asignments of variables to given values to be performed by M4:

Other variables may be used either by the template file or the corresponding driver.

How Does the Information from the User Reach the CGI?

Very easily. The groups created from the regular expressions in the left hand side terms of each rule are transformed into variables (the names used for these new variables have the form `term_<i>_match_<j>', where <i> refers to the ordinal number of the term within the rule (so the first term gets 1, the second 2, etc..) and the <j> refers to the order of the group counting from the left of the regular expression. Then if the query_string passed by the client is:

`NAME=JOSE&FAMNAME1=DE+LA+FUENTE&FAMNAME2=LOPEZ'

and the rule declared in the website is :

     QUERY_STRING: "NAME=([^&]*)";
     QUERY_STRING: "FAMNAME1=([^&]*)";
     QUERY_STRING: "FAMNAME2=([^&]*)";
the result woud be::
     term_0_match_0  <- "NAME=JOSE";
     term_0_match_1  <- "JOSE";
     term_1_match_0  <- "FAMNAME1=DE LA FUENTE";  
           (noticed the replacement of the + characters by ` ')
     term_1_match_1  <- "DE LA FUENTE";
     term_2_match_0  <- "FAMNAME2=LOPEZ";
     term_2_match_1  <- "LOPEZ";
Drivers:

We will not describe in this article the use of the drivers, all that is already explained in the documentation included with the distribution of PG2CGI. The interested reader is encouraged to look at the reference manual included.

For the moment there is only one driver, for the connection to database of the POSTGRESQL. However the author has in plans writing another driver for LDAP type databases.

An Example:

Let us now examine the sources for a full example.

Consider the table of announcements from the database of annnouncements at slug.ctv.es (follow the AVISO A LOS NAVEGANTES link) It is a very simple example that uses two templates to examine individual records from a table or make a full listing.

/etc/html2sql.cfg
{
  PATH_INFO: "^/avisos/?$"; # Selects by PATH_INFO
  [SERVER_ADMIN: ".*"];     # Obtains information from SERVER_ADMIN. Optional
} -> {
  DRIVER:   "POSTGRESQL";
  PGTTY:    "/dev/console";  # Sends logs to the console
  PGDATABASE:     "postgres";
# We make a query (always write the oid that
# will use it internally in the template file to
# link the individual records).
  PGQUERY:  "select oid,ct,titulo,texto,mt"
      " from avisos"
      " where (dt is NULL or dt > 'now')"
      " order by mt desc";
# File containing the template 
  M4FILE:    "/usr/local/etc/httpd/plantillas_m4/avisos.m4";
  WEBMASTER: "term_1_match_0";
  #TESTMODE: "TRUE";
}

# The following selection rule allow to select an announcement (aviso)
# with a known OID (primary key). The information is included in the
# variable  PATH_INFO of the CGI.

{
  PATH_INFO: "^/avisos/([0-9]+)/?$";
  SERVER_ADMIN: ".*";
} -> {
  DRIVER:   "POSTGRESQL";
  PGTTY:    "/dev/console";  # as before, write logs to the console
  PGDATABASE:     "postgres";
  OID:      "term_0_match_1"; # assign an OID

# Once again the selection is important. We include the OID
# at the beginning of the field just in case we wish to write
# a hyperlink to delete this record.

  PGQUERY:  "select oid,ct,titulo,texto,mt,dt,autor"
            " from avisos"
            " where (dt is NULL or dt > 'now') and oid=OID";
# The template is different now
  M4FILE:    "/usr/local/etc/httpd/plantillas_m4/avisos_oid.m4";
  WEBMASTER: "term_1_match_0";
  #TESTMODE: "TRUE";
}

/usr/local/etc/httpd/plantillas_m4/avisos.m4
define(<<<for>>>, <<<dnl
ifelse(eval((<<<$2>>>) <= (<<<$3>>>)), 1,
<<<define(<<<$1>>>,<<<$2>>>)$4<<<>>>dnl
for(<<<$1>>>,eval(<<<$2>>>+1),<<<$3>>>, <<<$4>>>)dnl
>>>)dnl
>>>)dnl
divert(0)dnl
Mime-Version: 1.0
Content-type: text/html

<HTML>
  <BODY BGCOLOR="#ffffff">
    <CENTER>
      <H1>AVISOS A LOS NAVEGANTES QUE PASAN POR SLUG</h1>
    </center>
    <B>Nota:</b> Esta secci&oacute;n ha sido creada para dar a conocer
    cualquier noticia de inter&eacute;s relacionada con
    <A HREF="http://slug.ctv.es/">SLUG</a>,
    <A HREF="http://LuCAS.ctv.es/">LuCAS</a>,
    <A HREF="http://www.HispaLinux.ctv.es/">HispaLinux</a>
    y en general, cualquier servicio prestado por <B>slug.ctv.es</b>.<p>
    <CENTER><HR WIDTH=100></center>
ifelse(PGRES_RESULTSTATUS, <<<PGRES_TUPLES_OK>>>,<<<dnl
ifelse(PGRES_NTUPLES, 0,<<<dnl

      <!-- la tabla est\xe1 vac\xeda -->
      No hay avisos.<p>

>>>,<<<dnl /* PGRES_NTUPLES != 0 )( */
      <CENTER>
        <!-- contenido de la tabla -->
        <TABLE>
          <TR>
            <TH></th>
            <TH ALIGN="LEFT">Fecha-Hora&nbsp;</th>
            <TH ALIGN="LEFT">Asunto&nbsp;</th>
          </tr>

for(<<<i>>>,0,eval(PGRES_NTUPLES-1),<<<dnl
          <TR>
            <TD>
              <A HREF="/cgi-bin/pg2cgi/avisos/cell(i,0)">
                <IMG SRC="/icons/burst.gif">
              </a>
            </td>
            <TD><B>cell(i,1)&nbsp;</b></td>
            <TD>cell(i,2)&nbsp;</td>
          </tr>
>>>)dnl /* for */

        </table>
      </center>

>>>)dnl /* PGRES_NTUPLES */

>>>,<<<dnl /* ifelse PGRES_RESULTSTATUS )(*/

      Error en el resultado: <B>PGRES_RESULTSTATUS</b><BR>
      Mensaje del servidor: PGRES_ERRORMSG<P>

>>>)dnl

      <CENTER><HR WIDTH=100></center>
      <FONT SIZE=-2>
        <A HREF="mailto:WEBMASTER?subject=TABLON DE ANUNCIOS"><CODE>WEBMASTER</code></a>
      </font>
  </body>
</html>

/usr/local/etc/httpd/plantillas_m4/avisos_oid.m4
divert(-1)
$Id: generic_list.m4,v 1.1 1998/07/06 17:13:33 luis Exp $
define(<<<cell>>>, <<<PGRES_CELL_$1_$2>>>)
define(<<<field>>>, <<<PGRES_FNAME_$1>>>)
define(<<<for>>>, <<<dnl
ifelse(eval((<<<$2>>>) <= (<<<$3>>>)), 1,
<<<define(<<<$1>>>,<<<$2>>>)$4<<<>>>dnl
for(<<<$1>>>,eval(<<<$2>>>+1),<<<$3>>>, <<<$4>>>)dnl
>>>)dnl
>>>)dnl
divert(0)dnl
Mime-Version: 1.0
Content-type: text/html

<HTML>
  <BODY BGCOLOR="#ffffff">
    <CENTER>
      <H1>AVISO OID</h1>
    </center>
    <CENTER><HR WIDTH=100></center>

ifelse(PGRES_RESULTSTATUS, <<<PGRES_TUPLES_OK>>>,<<<dnl
ifelse(PGRES_NTUPLES, 0,<<<dnl

      <!-- la tabla est\xe1 vac\xeda -->
      No existe el aviso OID, o ha caducado.<p>
>>>,<<<dnl /* PGRES_NTUPLES != 0 )( */
      <CENTER>
        <!-- contenido de la tabla -->
        <TABLE>
          <TR VALIGN="BASELINE">
           <TD ALIGN="RIGHT" NOWRAP><font color="#808000"><B>Fecha de entrada:</b></font> </td>
            <TD width="80%">cell(0,1)</td>
          </tr><TR VALIGN="BASELINE">
            <TD ALIGN="RIGHT" NOWRAP><font color="#808000"><B>Fecha &uacute;ltima modif:</b></font> </td>
            <TD>cell(0,4)</td>
          </tr><TR VALIGN="BASELINE">
            <TD ALIGN="RIGHT" NOWRAP><font color="#808000"><B>Fecha eliminaci&oacute;n:</b></font> </td>
            <TD WIDTH=*>cell(0,5)</td>
          </tr><TR VALIGN="BASELINE">
            <TD ALIGN="RIGHT" NOWRAP><font color="#808000"><B>Autor:</b></font> </td>
            <TD><font size=+1><a href="mailto:cell(0,6)?subject=[TABLON-SLUG] cell(0,2)">cell(0,6)</a></font></td>
          </tr><TR VALIGN="BASELINE">
            <TD ALIGN="RIGHT" NOWRAP><font color="#808000"><B>Asunto:</b></font> </td>
            <TD><font size=+1><B>cell(0,2)<B></font></td>
          </tr><TR>
            <TD COLSPAN=2 BGCOLOR="#c0ffff"><font color="#404040">cell(0,3)</font></td>
          </tr>
        </table>
      </center>

>>>)dnl /* PGRES_NTUPLES */

>>>,<<<dnl /* ifelse PGRES_RESULTSTATUS )(*/

      Error en el resultado: <B>PGRES_RESULTSTATUS</b><BR>
      Mensaje del servidor: PGRES_ERRORMSG<P>

>>>)dnl

      <CENTER><HR WIDTH=100></center>
      <FONT SIZE=-2>
        <A HREF="mailto:WEBMASTER?subject=TABLON DE ANUNCIOS"><CODE>WEBMASTER</code></a>
      </font>
  </body>
</html>

The results may be viewed in these URLs:
http://slug.ctv.es/cgi-bin/pg2cgi/avisos/
or
http://slug.ctv.es/cgi-bin/pg2cgi/avisos/20384

Download

The PG2CGI program can be dowloaded from the following URLs :

http://slug.ctv.es/~luis/utils/pg2cgi.tar.gz
http://slug.ctv.es/~luis/utils/pg2cgi.README
ftp://slug.ctv.es/pub/slug/luis/pg2cgi.tar.gz
ftp://slug.ctv.es/pub/slug/luis/pg2cgi.README