original in es Luis Colorado
es to en Miguel A Sepulveda
Luis Colorado works as administrator of UNIX systems and Internet access for Telefónica Sistemas S.A. in Spain. He graduated in Physics by the Universidad Complutense of Madrid and has developed several open source UNIX utilities.
We describe a new program that allows one accessing a database from HTML pages. The design goals of the program were to control data access, flexibility in the configuration and independence from layout formats. This program resulted from a number of ideas emerge during my readings of several articles about M4 that appeared recently in the magazines Linux Journal and LinuxFocus during the last 12 months.
This article describes the rationale and ideas that lead me to develop a package for accesing a database from the web. It is not my intention to write a manual on the PG2CGI package (a reference guide is already included in the distribution of the software, the URL for this package is given later), rather my goal is to give a brief introduction and to hopefully stimulate readers to use it and send their feedback.
I wrote this program as a response to a series of articles that appeared in the Linux Journal and in LinuxFocus concerning the use of M4 to manage HTML sources. These articles demonstrated the advantages and potential of M4, as a standalone tool for the maintenance of webpages and the dynamic generation of content.
On the other hand the availability of numerous web servers and databases contrasts with the lack of utilites that can interface between both environments (most of these interface applications are commercial or have serious deficiencies with regards to the formats that can be handled).
The following software package unifies the database and web environments. It was designed to satisfy these requirements:
It is a well known fact that HTML is a formatting language for formatting documents that can be display on a web browser. A common characteristic of a good text generator is a decent language for formatting text with sufficient control over the arrangement of the paragraphs and text items in the final typesetted page. However in most formatting languages there is always a set of macros that allows to handle floating elements in the document (that is elements like tables, images etc. that adjust dynamically on the page during runtime).
Most of the utilities I know for interfacing HTML and databases generate the information on the output text by extending the set of HTML tags; these new tags then take the role of `macros'. However this approach has a serious drawback: most of the time we have to modify the elements within the native HTML tags , and this is a major limitation. Often times these utilities have to `reinvent' tags just in order to allow a certain effect within an existing HTML tag. Mi program uses M4 in depth in order to accomplish macro replacements wherever needed. The program per se does not generate any formatted text and that is why it is independent of the final output format language used. The program generates a set of macros as a result of a query and later on M4 generates an HTML document by using this set of macros and a given HTML template.
The utility presented here satisfies these requirements at the expense of a small penalty in efficiency (it has to invoke M4 several times in the course of execution) but the results are satisfactory in most cases (take into account that in most cases the database query already takes more time than the dynamic generation of HTML text)
M4 is a macro processing tool developed a long time ago. Our software makes heavy use of this macro procesor:
M4 was originally developed to link the CGI (Common Gateway Interface) with the PostgreSQL database. Soon after people realized it was general enough for other interfaces besides CGI (for example a gateway between a LDAP server and other databases like Informix and Oracle). Later the interface was standarized incorporating all the necessary functionality and the PostgreSQL module was rewritten according to the new interface. Now it is possible to develop new drivers that allow the program to link with other database managers. |
The repeated used of the macro processor can mean a lost of efficiency, although our tests with the GNU M4 utility have been quite satisfactory.
Our utility makes heavy use of regular expressions to verify the configuration rules to be applied. Regular expressions are prefered over simple comparisons because they give us better functionality. Regular expressions have among other advantages:
|
The syntax of expressions and parsing the data for the drivers passed in the regular expressions can be easily done thanks to grouping multiple sub-chains in a single regular expression.
An example: let us supposed the client has to passed some informartion in the query_string, and that such data must fit to the following syntax:
FIELD=value
Furthermore the query_string has to satisfy this format and there should not be any additional information in the string.
Enforcing this syntax can be easily accomplised by writing the following term in the selection rule:
QUERY_STRING: "^FIELD=[^&]*$";
The previous line warrants that the rule will be invoked only if the QUERY_STRING fits the proper syntax. No only that, but if we write parenthesis around the value in the expression above then the program will allow to parse the corresponding value:
QUERY_STRING: "^FIELD=([^&]*)$";
In the process the program also transforms the escape sequences of the form %xx introduced by the navigator
This is the moment to describe how the program works. When it is first launched it receives variables from the web server environment and it then configures itself according to these variables. The environment variables are all that the program has to findout: How about the client? Where is the origen of the request? What is the request? What type of information (MIME type) is supported by the client? etc. PG2CGI then selects the rule to be used based on the left-hand side of the rule. At the present time there are three types of rules:
QUERY_STRING: "^FIELD=([^&]*)$";
! HTTP_ADDRESS: "^194\.142\.12\.";(This example would invalidate the rule if the request came from 194.142.12.xxx)
[ QUERY_STRING: "USER=([^&]*)" ];This term allows us to pass the value of the user if the client provides it, but it does not invalidate the rule if the client does not specify it.
With all this machinery we can start building rules. Next we will group a number of terms that specify a valid left-hand side for a rule. The terms are place between "{}" brackets.
The left and right hand side of a rule are delimited using curly brackets "{}" and are separated by the symbol `->'.
The right-hand side contain terms with the same syntax: a variable name, the character `:', a string of characters and the terminator `;'. All the terms on the right-hand side are asignments of variables to given values to be performed by M4:
Other variables may be used either by the template file or the corresponding driver.
Very easily. The groups created from the regular expressions in the left hand side terms of each rule are transformed into variables (the names used for these new variables have the form `term_<i>_match_<j>', where <i> refers to the ordinal number of the term within the rule (so the first term gets 1, the second 2, etc..) and the <j> refers to the order of the group counting from the left of the regular expression. Then if the query_string passed by the client is:
and the rule declared in the website is :
QUERY_STRING: "NAME=([^&]*)"; QUERY_STRING: "FAMNAME1=([^&]*)"; QUERY_STRING: "FAMNAME2=([^&]*)";the result woud be::
term_0_match_0 <- "NAME=JOSE"; term_0_match_1 <- "JOSE"; term_1_match_0 <- "FAMNAME1=DE LA FUENTE"; (noticed the replacement of the + characters by ` ') term_1_match_1 <- "DE LA FUENTE"; term_2_match_0 <- "FAMNAME2=LOPEZ"; term_2_match_1 <- "LOPEZ";Drivers:
We will not describe in this article the use of the drivers, all that is already explained in the documentation included with the distribution of PG2CGI. The interested reader is encouraged to look at the reference manual included.
For the moment there is only one driver, for the connection to database of the POSTGRESQL. However the author has in plans writing another driver for LDAP type databases.
An Example:
Let us now examine the sources for a full example.
Consider the table of announcements from the database of annnouncements at slug.ctv.es (follow the AVISO A LOS NAVEGANTES link) It is a very simple example that uses two templates to examine individual records from a table or make a full listing.
{ PATH_INFO: "^/avisos/?$"; # Selects by PATH_INFO [SERVER_ADMIN: ".*"]; # Obtains information from SERVER_ADMIN. Optional } -> { DRIVER: "POSTGRESQL"; PGTTY: "/dev/console"; # Sends logs to the console PGDATABASE: "postgres"; # We make a query (always write the oid that # will use it internally in the template file to # link the individual records). PGQUERY: "select oid,ct,titulo,texto,mt" " from avisos" " where (dt is NULL or dt > 'now')" " order by mt desc"; # File containing the template M4FILE: "/usr/local/etc/httpd/plantillas_m4/avisos.m4"; WEBMASTER: "term_1_match_0"; #TESTMODE: "TRUE"; } # The following selection rule allow to select an announcement (aviso) # with a known OID (primary key). The information is included in the # variable PATH_INFO of the CGI. { PATH_INFO: "^/avisos/([0-9]+)/?$"; SERVER_ADMIN: ".*"; } -> { DRIVER: "POSTGRESQL"; PGTTY: "/dev/console"; # as before, write logs to the console PGDATABASE: "postgres"; OID: "term_0_match_1"; # assign an OID # Once again the selection is important. We include the OID # at the beginning of the field just in case we wish to write # a hyperlink to delete this record. PGQUERY: "select oid,ct,titulo,texto,mt,dt,autor" " from avisos" " where (dt is NULL or dt > 'now') and oid=OID"; # The template is different now M4FILE: "/usr/local/etc/httpd/plantillas_m4/avisos_oid.m4"; WEBMASTER: "term_1_match_0"; #TESTMODE: "TRUE"; } |
define(<<<for>>>,
<<<dnl ifelse(eval((<<<$2>>>) <= (<<<$3>>>)), 1, <<<define(<<<$1>>>,<<<$2>>>)$4<<<>>>dnl for(<<<$1>>>,eval(<<<$2>>>+1),<<<$3>>>, <<<$4>>>)dnl >>>)dnl >>>)dnl divert(0)dnl Mime-Version: 1.0 Content-type: text/html <HTML> <BODY BGCOLOR="#ffffff"> <CENTER> <H1>AVISOS A LOS NAVEGANTES QUE PASAN POR SLUG</h1> </center> <B>Nota:</b> Esta sección ha sido creada para dar a conocer cualquier noticia de interés relacionada con <A HREF="http://slug.ctv.es/">SLUG</a>, <A HREF="http://LuCAS.ctv.es/">LuCAS</a>, <A HREF="http://www.HispaLinux.ctv.es/">HispaLinux</a> y en general, cualquier servicio prestado por <B>slug.ctv.es</b>.<p> <CENTER><HR WIDTH=100></center> ifelse(PGRES_RESULTSTATUS, <<<PGRES_TUPLES_OK>>>,<<<dnl ifelse(PGRES_NTUPLES, 0,<<<dnl <!-- la tabla est\xe1 vac\xeda --> No hay avisos.<p> >>>,<<<dnl /* PGRES_NTUPLES != 0 )( */ <CENTER> <!-- contenido de la tabla --> <TABLE> <TR> <TH></th> <TH ALIGN="LEFT">Fecha-Hora </th> <TH ALIGN="LEFT">Asunto </th> </tr> for(<<<i>>>,0,eval(PGRES_NTUPLES-1),<<<dnl <TR> <TD> <A HREF="/cgi-bin/pg2cgi/avisos/cell(i,0)"> <IMG SRC="/icons/burst.gif"> </a> </td> <TD><B>cell(i,1) </b></td> <TD>cell(i,2) </td> </tr> >>>)dnl /* for */ </table> </center> >>>)dnl /* PGRES_NTUPLES */ >>>,<<<dnl /* ifelse PGRES_RESULTSTATUS )(*/ Error en el resultado: <B>PGRES_RESULTSTATUS</b><BR> Mensaje del servidor: PGRES_ERRORMSG<P> >>>)dnl <CENTER><HR WIDTH=100></center> <FONT SIZE=-2> <A HREF="mailto:WEBMASTER?subject=TABLON DE ANUNCIOS"><CODE>WEBMASTER</code></a> </font> </body> </html> |
divert(-1) $Id: generic_list.m4,v 1.1 1998/07/06 17:13:33 luis Exp $ define(<<<cell>>>, <<<PGRES_CELL_$1_$2>>>) define(<<<field>>>, <<<PGRES_FNAME_$1>>>) define(<<<for>>>, <<<dnl ifelse(eval((<<<$2>>>) <= (<<<$3>>>)), 1, <<<define(<<<$1>>>,<<<$2>>>)$4<<<>>>dnl for(<<<$1>>>,eval(<<<$2>>>+1),<<<$3>>>, <<<$4>>>)dnl >>>)dnl >>>)dnl divert(0)dnl Mime-Version: 1.0 Content-type: text/html <HTML> <BODY BGCOLOR="#ffffff"> <CENTER> <H1>AVISO OID</h1> </center> <CENTER><HR WIDTH=100></center> ifelse(PGRES_RESULTSTATUS, <<<PGRES_TUPLES_OK>>>,<<<dnl ifelse(PGRES_NTUPLES, 0,<<<dnl <!-- la tabla est\xe1 vac\xeda --> No existe el aviso OID, o ha caducado.<p> >>>,<<<dnl /* PGRES_NTUPLES != 0 )( */ <CENTER> <!-- contenido de la tabla --> <TABLE> <TR VALIGN="BASELINE"> <TD ALIGN="RIGHT" NOWRAP><font color="#808000"><B>Fecha de entrada:</b></font> </td> <TD width="80%">cell(0,1)</td> </tr><TR VALIGN="BASELINE"> <TD ALIGN="RIGHT" NOWRAP><font color="#808000"><B>Fecha última modif:</b></font> </td> <TD>cell(0,4)</td> </tr><TR VALIGN="BASELINE"> <TD ALIGN="RIGHT" NOWRAP><font color="#808000"><B>Fecha eliminación:</b></font> </td> <TD WIDTH=*>cell(0,5)</td> </tr><TR VALIGN="BASELINE"> <TD ALIGN="RIGHT" NOWRAP><font color="#808000"><B>Autor:</b></font> </td> <TD><font size=+1><a href="mailto:cell(0,6)?subject=[TABLON-SLUG] cell(0,2)">cell(0,6)</a></font></td> </tr><TR VALIGN="BASELINE"> <TD ALIGN="RIGHT" NOWRAP><font color="#808000"><B>Asunto:</b></font> </td> <TD><font size=+1><B>cell(0,2)<B></font></td> </tr><TR> <TD COLSPAN=2 BGCOLOR="#c0ffff"><font color="#404040">cell(0,3)</font></td> </tr> </table> </center> >>>)dnl /* PGRES_NTUPLES */ >>>,<<<dnl /* ifelse PGRES_RESULTSTATUS )(*/ Error en el resultado: <B>PGRES_RESULTSTATUS</b><BR> Mensaje del servidor: PGRES_ERRORMSG<P> >>>)dnl <CENTER><HR WIDTH=100></center> <FONT SIZE=-2> <A HREF="mailto:WEBMASTER?subject=TABLON DE ANUNCIOS"><CODE>WEBMASTER</code></a> </font> </body> </html> |
The results may be viewed in these URLs:
http://slug.ctv.es/cgi-bin/pg2cgi/avisos/
or
http://slug.ctv.es/cgi-bin/pg2cgi/avisos/20384
The PG2CGI program can be dowloaded from the following URLs :
http://slug.ctv.es/~luis/utils/pg2cgi.tar.gz