W3C Lib Using

Using the W3C Reference Library

WIP This paper is still under construction. Comments are welcome at libwww@w3.org

This guide describes the "user's view" of the W3C Reference Library. It concentrates on describing the API and how the application programmer can use the Library. It also describes who is responsible for memory management, how to initialize modules etc. Reading this guide should be sufficient in order to use the Library without being aware of exactly what is going on underneath the interface.

NOTE This document is also available as one big HTML file intended for printout. Please note that not all links in this version work!

Table of Contents

  1. Getting Started
  2. The Core
  3. Application Modules
  4. Utility Modules


Henrik Frystyk, libwww@w3.org, December 1995
Getting Started W3C Lib Using

Getting Started

This guide assumes that you have already compiled the library and are ready to use it building an application. If this is not the case then please read the Installation guide before you continue reading this document. One thing to note is that as new versions of the Library are released frequently, it is recommended that you verify that the version is up to date. On a Unix platform you can do this by typing the following command
	cat Library/Implementation/Version.make
assuming that you are at the top of the WWW tree created when unpacking the distribution file. You can compare your version of the code with the current version which is available from the online documentation from our WWW server.

The Library functionality is divided into a set of modules that each has its own include file. They all have a name starting with WWW, and as they will be referenced throughout this guide, we might as well introduce them from the beginning:

WWWUtil.h
The WWW Utility module contains a lot of the functionality that makes it possible to make applications, that is container modules for data objects, basic string functionality etc. This module is the basis for all of the following modules and is used extensively.
WWWCore.h
The WWW Core module is a set of registration modules that glues an application together. It contains no real functionality in itself; it is for example not capable of loading a HTML document. It only provides a large set of hooks which can be used to add functionality to the Library and to give an application real life. We will here a lot more to the structure of the core, and much of this guide is actually describing how to add functionality to the core.
WWWLib.h
This include is the main include file for the Library. It basically consists of the WWWUtil.h and the WWWCore.h so that the application only needs to include this one instead of two.
WWWApp.h
This module contains a huge set of modules that can be hooked into the Core Library and make the application work. In contrast to the Core part, you can pick exactly the modules you want from the WWWApp.h in order to create your special application whether it is a server, a client, a proxy, a robot or any other Web application.
As mentioned, in order to use the core public functions of the Library, your application needs to include the file "WWWLib.h". This file is a container for all the Library include files that together define the public API of the core of the Library. As described in the document Library Architecture the core is a frame work for other modules that provide the actual functionality, for example for parsing HTML documents or getting a document via HTTP or FTP. We will explain later how this functionality can be enabled and used by an application.

The application must explicitly initialize the Library before it can start using it so that the internal file descriptors, and variables can be set to their respective values. This only has to be done once while the application is running and is typically done when the application is started. The application also should close down the Library when it has stopped using it - typically when the application is closing down. The Library will then return resources like file descriptors and dynamic memory to the operating system. In practice the initialization and termination is done using the following two functions:

BOOL HTLibInit(const char * AppName, const char * AppVersion)
This function initializes memory, file descriptors, and interrupt handlers etc. By default it also calls initialization functions for all the dynamic modules in the Library. The dynamic modules are described in "Libwww Architecture". A major part of the User's Guide is devoted to describing how the Library can be configured, both at run time and at compile time, and the dynamic modules are an important part of the Library configuration.

The two arguments to the function are the name of the application and the version number respectively. It is not a requirement that these values are unique and they can both be the empty string (""). However, as the strings are used in the HTTP protocol module when communicating with other WWW applications it is strongly recommended that the values are chosen carefully according to the HTTP specifications. The most important requirement is to use normal ASCII characters without any form for space as we will see in the example below.

BOOL HTLibTerminate()
This function cleans up the memory, closes open file descriptors, and returns all resources to the operating system. It is essential that HTLibInit(...) is the first call to the Library and HTLibTerminate() is the last as the behavior otherwise is undefined.

Building your first Application

We have now explained the first few steps in how to initialize the core of the Library. We can now write our first minimal application which will do absolutely nothing but initializing and terminating the Library. In the example we assume that the compiler knows where to look for the Library include file WWWLib.h and also knows where to find the binary library, often called libwww.a on Unix platforms and libwww.lib on Windows. Again, the result might depend on the setup of the dynamic modules, but if no dynamic modules are enabled then the example will generate an executable file. If you are in doubt about how to set your compiler then you can often get some good ideas by looking into the Line Mode Browser.

PS: You can find the examples directly in form of C files in our example area

#include "WWWLib.h"
int main()
{
    HTLibInit("TestApp", "1.0");
    HTLibTerminate();
    return 0;
}
Some platforms require a socket library when building network applications. This is for example the case when building on Macintosh or Windows machines. The Library uses the GUSI socket library on the Macintosh and the WinSock library on windows platforms. Please check the documentation on these libraries for how to install them and also if there are any specific requirements on your platform when building network applications.

Adding Functionality

The core parts of the Library can be thought of as a framework with hooks for adding functionality depending on what the applications needs. The core can not do anything on its own but when hooking in modules, the Library suddenly can start doing useful stuff. In the Library distribution file there is already a large set of specific protocol modules and stream modules for handling many common Internet protocols and data formats. We will explain protocol modules and streams in much more detail later, but for the moment it is sufficient to know that protocol modules knows how to communicate with remote servers on the Internet and that streams are used to pass documents back and forth between the Internet and the application with the Library as an intermediary part.

While the previous application wasn't capable of doing anything we will now add functionality so that we can request a URL from a remote Web server and save the result to a file. To do this we need to register two additional modules after initializing the Library: A protocol module that handles HTTP and a stream that can save data to a local file. Both these modules are already in the distribution file and in this example we show how to enable them. It is also possible to write your own versions of these modules and then register them instead of the ones provided with the Library. This makes no difference to the core part of the Library and is an example of how the functionality can be extended or changed by adding new modules as needed.

#include "WWWLib.h"
#include "HTTP.h"

int main()
{
    HTList *converters = HTList_new();		     /* Create a list object */

    /* Initialize the Library */
    HTLibInit("TestApp", "1.0");

    /* Register the HTTP Module */
    HTProtocol_add("http", YES, HTLoadHTTP, NULL);

    /* Add a conversion to our empty list */
    HTConversion_add(converters, "*/*", "www/present", HTSaveLocally, 1.0, 0.0, 0.0);

    /* Register our list with one conversion */
    HTFormat_setConversion(converters);

    /* Delete the list with one conversion */
    HTConversion_deleteAll(converters);

    /* Terminate the Library */
    HTLibTerminate();
    return 0;
}
The two new things in this example is that we now have two registration functions. We will explain more about these functions as we go along; for now we will only introduce the functions and their arguments. The interesting part about the two registration functions is that they represent the two ways of registration in the Library: Some things are registered directly like the protocol module and other things are registered as lists of objects like the list of converters. The reason for this is to make the registration process easier for the application to handle; protocol modules are often initialized only once while the application is running. Therefore it is easier to register them directly. As we will see later in this guide, converters, however, can be enabled and disabled depending on a regular basis depending on what the application is trying to do. It is therefore easier to keep the converters in lists so that they can be enabled and disabled in batch.

Now, let's take a closer look at the two registration functions. The first registers the HTTP protocol module which enables the Library of accessing documents using HTTP from any server on the Internet.

extern BOOL HTProtocol_add (const char *       	name,
			    BOOL		preemptive,
			    HTEventCallBack *	callback);
The first argument is a name equivalent to the scheme part in a URL, for example http://www.w3.org, where http is the scheme part. When a request is issued to the Library using a URL, it looks at the URL scheme and sees if it knows how to handle it. If not then an error is issued. The second argument describes whether the protocol module supports non-blocking sockets or not. This is a decision to be made when the module is first designed and can normally not be changed. In the example we register HTTP for using blocking sockets, but all native Library protocol modules including HTTP, FTP, News, Gopher, and access to the local file system supports non-blocking sockets. The third argument is the name of the protocol function to be called when the Library is about to hand off the request to the module.

extern void HTConversion_add   (HTList *	conversions,
				CONST char * 	rep_in,
				CONST char * 	rep_out,
				HTConverter *	converter,
				double		quality,
				double		secs, 
				double		secs_per_byte);
This function has many arguments and we will not go into details at this point. The important thing to note is that we build a list of converters. Each call to the HTConversion_add creates a new converter object and adds it to the list. A converter object is described by an input format (rep_in), an output format (rep_out), the function name of the converter, and a quality factor describing how good the conversion is. The last two arguments are currently not used but are reserved for future use. The quality factor later where we will see how it can be used to distinguish between multiple conversions in order to pick the best one.

Even though we now have initialized a protocol module and a converter, the program example is still not actively doing anything. It only starts the Library, registers two modules and then terminates the Library again. Our third and last example in this section does the same amount of initialization but does also issue a request to the Library for fetching a URL.

Fetching a URL

We now want to create an example which is capable of issuing a request to the Library to fetch a URL. When doing this we must create a request object that contains all information that is necessary in order to handle the request. Then we pass this object to the Library and if the request is valid and the document does exist we get the data back. In the example we read the URL to fetch from the command line. As in the other examples we are not too worried about error checking and error messaging. In a real application you must of course do this, but here we want to keep the examples simple.
#include "WWWLib.h"
#include "HTTP.h"
#include "HTDialog.h"

int main (int argc, char ** argv)
{
    HTList * converters = HTList_new();
    HTRequest * request = HTRequest_new();	  /* Create a request object */
    WWWTRACE = SHOW_ALL_TRACE;
    HTLibInit("TestApp", "1.0");
    HTProtocol_add("http", YES, HTLoadHTTP, NULL);
    HTConversion_add(converters, "*/*", "www/present", HTSaveLocally, 1.0, 0.0, 0.0);
    HTFormat_setConversion(converters);
    HTAlert_add(HTPrompt, HT_A_PROMPT);
    if (argc == 2) {
	HTLoadAbsolute(argv[1], request);
    } else
	printf("Type the URL to fetch\n");
    HTRequest_delete(request);			/* Delete the request object */
    HTConversion_deleteAll(converters);
    HTLibTerminate();
    return 0;
}
When this program is run, it will take the argument and call the Library to fetch it. As we haven't given any name for the file which we are creating on our local disk, the Library will prompt the user for a file name. Automatic redirection and access authentication is handled by the HTTP module but might require the user to type in a user name and a password. An example on how to run this program is:
./fetch_url http://www.w3.org/pub/WWW/
The results stored in the file contains the whole message returned by the remote HTTP server except for the status line. This means that if we ask a HTTP/1.0 compliant server then we receive a header and a body where the header contains metainformation about the object, for example content type, content language etc. We shall later see how the MIME parser stream can strip out the header information so that we end up with the body of the response message.

In the next chapters we shall see that protocol modules and converters only is a part of what can be registered in the Library and that the application can specify many other types of preferences and capabilities.


Henrik Frystyk, libwww@w3.org, December 1995
User Guide - The Library Core W3C Lib Using

The Library Core

As mentioned earlier in this guide, the Core of the W3C Reference Library is the set of modules that must be included in every application. However, the core does not contain any functionality for neither requesting a HTML document using HTTP, not to parse it or present it to the user. All this functionality is dynamically registered in the core using an open set of application modules which we will describe later.

The Core is basically a set of registration mechanisms that glue together the application modules, and in the following chapter we will look how to configure the core to contain exactly the functionality we want for our application. If you are interested in a more detailed description of the architecture of the core to see how the glue is designed then please read the chapter on the model behind the Core of the Library in the Architecture document. In this section we will concentrate on the APIs defined by the WWWCore.h include file and how to use the Core in a real application.


Henrik Frystyk, libwww@w3.org, December 1995
User's Guide - Preferences W3C Lib Using

Request Preferences

The introductory chapter introduced converters and how they can be set up. Converters are a part or a family of preferences in the Library that can be configured by applications and this chapter explains in more detail how to use these features. The family of preferences includes the following members: As mentioned in the previous chapter, when the Library is first initialized, it knows nothing about these preferences. By specifying preferences, the application can tailor the Library to fit the features supported by the application and by the end user. In the following section we will describe how the application can set up the various preferences. All preferences described in this chapter use lists to group the sets together. As we will see later in this chapter, the reason for this is that lists are an easy way of assigning specific preferences to various requests.

Format Converters

We have already seen an example on how a converter can be set up. Let's take a step back and look at the declaration of the function that adds a converter, HTConversion_add(...):
extern void HTConversion_add   (HTList *	conversions,
				CONST char * 	rep_in,
				CONST char * 	rep_out,
				HTConverter *	converter,
				double		quality,
				double		secs, 
				double		secs_per_byte);
The first argument is a list object. List objects are one of the several container objects in the Library and they are explained in more details in the W3C Library Internals. All we have to know at this point is to create a list object:
extern HTList *	HTList_new (void);
The two next arguments describes the input format and the output format of the data that is entering and leaving the converter respectively. The syntax for these formats follow the syntax defined by the HTTP Protocol and the MIME specification which has a type string and a subtype string separated by a slash "/"
	<type> "/" <subtype>
Some of the most common examples are
	text/plain
	text/html
	image/gif
	audio/basic
	*/*
In addition to these "official" MIME types, the Library has a small set of internal representations that uniquely exist within the Library. They are used to describe data formats that are not really formats but an intermediate state of the document. The two most used formats of this type are
	www/present
	www/unknown
The internal formats are characterized by having the type www which doesn't exist anywhere but in the Library. The first of the two subtypes shown represent the rendered document as presented to the user and the second subtype represents an unknown data format.

The converter argument is a pointer to the function that is to be called in order to create a converter object capable of handling the conversion from the input type to the output type. By registering a pointer pointing to the converter, the converter can be set up dynamically. This allows the Library to evaluate the set of registered converters each time a conversion is requested and then chose the best suitable converter on the fly.

The next argument is the quality factor which we will describe in a separate paragraph later in this chapter. The last two arguments are not currently used but are reserved for future use. For now, using a value of 0 is perfectly valid.

Converters are intended to be used when we have our own module to handle the data coming from the remote server. The module can either be one provided by the Library or one made by the application. However, in some cases we would rather hand off the data to an external application for presenting the data. Often external applications are viewers of some sort, for example a postscript viewer or a mpeg viewer. The Library lets us register external applications as presenters very much like converters. This will become obvious if we take a look at how we register presenters:

extern void HTPresentation_add (HTList *	conversions,
				CONST char * 	representation,
				CONST char * 	command,
				CONST char * 	test_command,
				double		quality,
				double		secs, 
				double		secs_per_byte);
As was the case with converters, the first argument is a list which we create in exactly the same way as shown before. Presenters only need a input format as we hand off the data to the external application and never sees it again. A special thing about presenters and converters is that as they are very similar they are also treated very much alike internally in the Library. Therefore a list object can contain both converters and presenters at the same time. This makes often the management easier for the application instead of having to deal with two separate lists.

The next field is reserved to be used in connection with mail cap parsers as the test field of a mail cap file. The Library does not yet directly support Mail Cap files but the registration of presenters is foreseen to be able to work with mail cap files. The Arena browser is an example of an application having its own Mail Cap file parser while using the Library. The description of the test field in RFC 1524 is included below:

The "test" field may be used to test some external condition (e.g., the machine architecture, or the window system in use) to determine whether or not the mail cap line applies. It specifies a program to be run to test some condition. The semantics of execution and of the value returned by the test program are operating system dependent, with UNIX semantics specified in Appendix A. If the test fails, a subsequent mail cap entry should be sought. Multiple test fields are not permitted -- since a test can call a program, it can already be arbitrarily complex.

The last three arguments are exactly identical to the conversion registration so there is no need to describe them any more here. Again, the quality factor will be described in details later in this chapter.

Natural Languages

The preferred natural language or languages is in almost all situations dependent on the individual user and an application should therefore give the user the opportunity to change the setup. When specifying a natural language preference, the Library will send this preference along with all HTTP requests. The remote server will then (it if supports this feature) look for a version in the language or languages mentioned. If it finds a matching document then it returns this one, otherwise it uses the best alternative. If no language is specified the remote server may whatever version it finds. You can add an element to the list of natural languages by using the following function:
extern void HTLanguage_add (HTList *		list,
			    CONST char *	lang,
			    double		quality);
The list object containing the set of natural languages is similar to the list elements containing the converters and the presenters. However, in contrast to the former two which actually can be one list, the list of natural languages must be a list on its own.

The semantics of the language argument follows closely the Language tag of the HTTP protocol which in terms is based on the RFC 1766. Some example tags are

	en
	en-US
	en-cockney
	i-cherokee
	x-pig-latin
where any two-letter primary tag is n ISO 639 language abbreviation and any two-letter initial subtag in an ISO 3166 country code.

Content Encodings

Some documents are not send in their original data obejct but is encoded in some way. On the Web this is mostly some kind of compression but other encodings for example base 64 can be encountered when talking to NNTP servers etc. Just as for the other preferences, an application can register a supported encoders or decodes as a list. Encoders and decoders are registered in the same way with no differentiation whether it is a encoder or a decoder:
extern void HTEncoding_add (HTList * 		list,
			    CONST char *	encoding,
			    double		quality);
The list argument is the now well-known way of handling these preferences and we will see this many more times throughout the guide. The "encoding" argument is a constant string just like the data format descriptions in the registration of converters and presenters. The values are also inspired strongly by the HTTP Protocol and the MIME specification and some of the most common examples are:
	base64
	compress
	gzip
As the list of natural languages, the list of encoders and decoder must be a separate list.

Character Sets

As the Web reaches all parts of the Internet there are more and more documents written in languages which contains characters not included in the ISO-8859-1 character set. A consequence of this the set of characters sets is often tightly connected with the natural language. The Library does not directly support other character sets but in case an application is capable of handling alternative sets it can register these as preferred character sets along with a quality factor just as all the other preferences in this section.
extern void HTCharset_add (HTList *		list,
			   CONST char *		charset,
			   double		quality);
Also the charset argument is inspired by the HTTP Protocol and the MIME specification. Some of the most common examples of the charset parameter is:
	US-ASCII
	ISO-8859-1
	UNICODE-1-1
Again, the list of preferred character sets must be a separate list.

The Quality Factor

Characteristic for all the preferences above is that there is a quality factor associated with each member. The quality factor is a real number between 0 and 1 with 0 meaning "very bad" and 1 means "perfect". By registering a natural language or any or other preference in this group together with a quality factor you can specify "how well the preference is handled" either by the application or by the user. In the case of the user the quality factor of a natural language is how well the user understands the language. In my case, the quality factors for, for example Greek would be close to zero and 1 for Danish (nothing bad said about Greek!).

It is a bit different for converters where it is often the application's ability of handling the data format rather than the user's perception. As an example it is often faster to use a converter than a presenter as it takes time to launch the external application and the Library can not use progressive display mechanisms which is often the case for converters. Therefore, as an example, if we capable of handling an image in png format inline but rely on an external viewer for presenting postscript, we might set up the following list:

HTConversion_add (converters, "image/gif", "www/present", GifPresenter, 1.0, 0.0, 0.0);

HTPresentation_add (presenters, "application/postscript", "ghostview %s", NULL, 0.5, 0.0, 0.0);
where the gif converter is registered with a quality factor of 1.0 and the postscript presenter with a quality factor of 0.5.

Enabling Preferences

All we have done until now is to show how we can register sets of preferences. However, we still need to define where and when to actually let the Library use the preferences. This can be done in two ways: Globally or locally. When assigning a set of preferences, for example the set of natural languages, it can either be assigned to all future requests (globally) or to a specific request (locally). The preferences can also partly be assigned globally and partly locally so that the most common preferences are registered globally and only some preferences specific to a single request is then added by registering the sets locally.

Here we will only show how to enable the preferences globally. Later when we have discussed how to create a request object we will see how to enable the preferences locally and also if they are to be added to the global list or completely override the global list for a particular request.

Converters and Presenters

extern void HTFormat_setConversion	(HTList *list);
extern HTList * HTFormat_conversion	(void);

Content Encodings

extern void HTFormat_setEncoding	(HTList *list);
extern HTList * HTFormat_encoding	(void);

Content Encodings

extern void HTFormat_setLanguage	(HTList *list);
extern HTList * HTFormat_language	(void);

Character Sets

extern void HTFormat_setCharset		(HTList *list);
extern HTList * HTFormat_charset	(void);

Cleaning up Preferences

As the application is responsible for setting up the sets of preferences, it is also responsible for deleting them once they are not needed anymore, for example when the application i s closing down, or the user has changed them. The Library provides two mechanisms for cleaning up old lists: It can either be done by invoking separate methods on each set of preferences, or it can be done in a batch of all globally registered preferences or all locally registered preferences relative to a single request. In this context, a batch is the total set of registered converters, encoder, charsets, and languages. Here we will only show how to cleanup preferences set-wise and as a globally batch of preferences. We leave the local cleanup until we have described the request object later in this guide.

Common for the cleanup methods is that when they have been called you can nor more use the lists as they are not pointing to valid places in the memory. The first mechanism for cleaning up lists is by calling the cleanup method of each preference as indicated below:

Converters

extern void HTConversion_deleteAll	(HTList * list);

Presenters

extern void HTPresentation_deleteAll	(HTList * list);

Content Encodings

extern void HTEncoding_deleteAll	(HTList * list);

Content Languages

extern void HTLanguage_deleteAll	(HTList * list);

Character Sets

extern void HTCharset_deleteAll		(HTList * list);
The second mechanism which at once cleans up all globally registered preferences can often be used in order to simplify the management done by the application. Note, however, that all globally lists become inaccessible for future reference. In you want to define new sets of preferences then you need to start all over again and create a new list object.
extern void HTFormat_deleteAll		(void);

Getting Help on Initialization Converters and Presenters

The Library has a special module called HTInit which helps the application doing the initialization of all the converters and other preferences supported internally by the Library. This module is not called directly from the Library and must explicitly be invoked by the application. HTInit is a part of the WWWApp.h include file described in the previous section, so if you include this in your application then you have direct access to the following functions: The following function initializes all the converters supported natively by modules in the Library distribution file:
extern void HTConverterInit	(HTList * conversions);
There is a similar function for registering a common set of presenters that can be found on many (especially Unix) platforms:
extern void HTPresenterInit	(HTList * conversions);
In order to show the similarity between how converters and presenters are handled in the Library, there is also a single function that does the work of the two previous functions at once:
extern void HTFormatInit	(HTList * conversions);

Summary

We have now seen how to enable multiple sets of preferences using a very similar naming scheme and registration process. As mentioned in the beginning, lists are easy to handle when the preferences are likely to be changed often as the application is executed. In the next section, we will take a look at a slightly different registration mechanism which is more suited for topics that only rarely are registered multiple times during the execution of an application.


Henrik Frystyk, libwww@w3.org, December 1995
Using - Access Modules W3C Lib Using

Enabling Access Modules

The Library comes with a wide set of access modules that gives access to most popular Internet protocols including HTTP, FTP, Gopher, telnet, rlogin, NNTP and WAIS. However, as mentioned in the beginning, when the Library is first initialized it knows nothing about how to access the Internet. In fact it doesn't even know how to access the local file system. It is for the application to tell the Library what it can handle and where to go find the functionality. This is very much the same mechanism as we saw described in the previous chapter, so a lot of what is going on will hopefully become clear as we go along.

All protocol modules are dynamically bound to an access scheme. Take for example the following URL:

	http://www.w3.org/
It has the access scheme http and if we have a protocol module capable of handling HTTP then we can make the binding between http and this module. As mentioned in the introduction to this chapter, the Library already comes with a large set of protocol module, including HTTP so all we have to do in this case is to register the HTTP module to the Library as being capable of handling http URLs.

Let's see how we can register a protocol module. The support for this is provided by the protocol manager which exports the following function:

extern BOOL HTProtocol_add (CONST char *       	scheme,
			    BOOL		preemptive,
			    HTEventCallBack *	callback);
This function follows exactly the same naming scheme as we have seen many times before. The first argument the access scheme which the protocol module is capable of handling. This can for example be http, but it can also be non-existent schemes which can be used for experimental protocol implementations, for example whois etc. In case a protocol module is capable of handling more than one access scheme, it can be registered multiple time with different schemes. This is the case with the Telnet access module which also can handle rlogin and tn3270 terminal sessions.

The preemptive argument describes to the Library whether it is capable of handling non-blocking sockets or not. This is normally a design decision when implementing the protocol module in that a module implemented for using blocking sockets normally can't use non-blocking sockets. However, the other way is often possible, and in some situations it is advantageous to use blocking sockets. The Library allows this to happen on a pr request basic as explained in the section "The Request Object". The Library Architecture document discusses in more detail how a protocol module can be designed to support non-blocking sockets.

The last argument is the actual function name to call when a request has been issued and a protocol module has been found associated with the access scheme used. Even though it is not clear at this point the HTEventCallBack type is a function that the event handler uses in order to initiate requests in the Library.

A protocol module can be disabled at any time during execution. In most cases this is not uses very often but the dynamic nature of the binding leaves this choice free to the application. In case it is desired, you can do so by calling the following function:

extern BOOL HTProtocol_delete (CONST char * scheme);
The argument is exactly the same scheme as described above. One special case is the support for access to WAIS databases. WAIS has its own code Library called freeWAIS which is required in order to directly access wais URLs. We shall not describe in describe in detail here how this can be enabled as it is described in the the WWW-WAIS gateway.


Henrik Frystyk, libwww@w3.org, December 1995
Bindings to the local File system W3C Lib Using

Bindings to the local File system

The preferences that we described in section Request Preferences did not mention what the Library should do if it doesn't know the data format of a document. In many protocols this information is provided by the remote server. Typical examples are MIME like protocols where the metainformation such as the Content-Type and the Content-Language is provided together with the document. However, applications often have access to the local file system using file URLs which in general do not keep any or at least very little information of the file type. It is therefore required to have some kind of binding between the file system and the preferences registered in the Library which provides this mateinformation about the object.

Often files in a file system is classified by some sort of a suffix, for example GIF files are often ending in .gif, text files in .txt etc. This binding is not static and it is therefore required to have a dynamic binding just like the preferences themselves. An example of the latter is HTML files which on most Unix systems end in .html whereas they on many MS-DOS based systems end in .htm.

The HTBind module provides a generic binding mechanism between a file and its representation internally in the Library. It is not limited to simple file suffix classification but can also be used in more advanced environments using data bases etc. However, at this point we are interested in how we can register bindings between file suffixes and for example content types, content languages etc.

Before starting a more detailed description of how to register file suffixes, it might be required to define what actually is a file suffix and what is the set of delimiters separating them on a particular platform. The Bind manager is born with a certain knowledge about the set of delimiters but more can be added to provide the functionality desired. This can be done using the following function:

extern void HTBind_caseSensitive	(BOOL sensitive);
where sensitive can either be YES or NO. Also the set of delimiters can be defined using the following function:
extern CONST char *HTBind_delimiters	(void);
extern void HTBind_setDelimiters	(CONST char * new_suffixes);
Examples of a list of suffixes are
	"._"
	"."
	"._-"
Note that the suffixes chosen do not have to be connected with what is available on a particular platform. However, a certain coupling will probably make maintenance of the file system easier for all parties. In the following we will show the API for adding bindings between the preferences and the file system. You can add a binding between a Content type and a suffix by using the following function:
extern BOOL HTBind_addType	(CONST char *	suffix,
				 CONST char *	format,
				 double		value);
Calling this with suffix set to "*" will set the default representation which is used in case no other suffix fits the actual file. Using a suffix set to "*.*" will set the default representation for unknown suffix files which contain a "." The format argument is exactly like described in the section Request Preferences. In exactly the same way you can add a binding between an encoding anda file suffix using the following function:
extern BOOL HTBind_addEncoding	(CONST char *	suffix,
				 CONST char *	encoding,
				 double		value);
Bindings can also be made between a file suffix and a specific natural language:
extern BOOL HTBind_addLanguage	(CONST char *	suffix,
				 CONST char *	language,
				 double		value);
In all cases, it should be mentioned, that any of the suffixes can contain characters that normally must be escaped in a URL, for example space < >. However, they should not be encoded when parsed as the suffix parameter but left as is.


Henrik Frystyk, libwww@w3.org, December 1995
Registering Protocol Headers W3C Lib Using

Registering Protocol Headers

The Library provides a few powerful mechanisms to handle document metainformation and how to generate and parse additional header information coming across the network. This section describes how to handle metainformation and headers and how this can be used to experiment with existing protocols by means of additional headers.

Header Generation

Outgoing metainformation describing preferences in requests or entities to be sent to a remote server is handled in two ways: The Library supports a "native" set (called known headers of headers which can be manipulated directly, but it also provides support for header extensions defined by the application. This section describes how both the existing set of headers and the extensions can be handled.

Generating Known Headers

The Library manages a "native" set of protocol headers which we will introduce in this section. The default behavior for the Library is to use a representative set of headers on each request but all headers can be explicitly enabled or disabled on a per request basic by the application. Here we will mainly describe the set of native headers but leave the description of how to manipulate them for the section on managing Request objects. The native set of headers fall into the following three categories:
General Headers
There are a few header fields which have general applicability for both request and response messages, but which do not apply to the communication parties or the entity being transferred. This mask enables and disables these headers. If the bit is not turned on they are not sent. All headers are optional and the default value is not to use any of these headers at all.
Request Headers
The request header fields allow the client to pass additional information about the request (and about the client itself) to the server. All headers are optional but the default behavior is to use all request headers except From and Pragma. The reason is that the former in general requires permission by the user and the latter has special meanings for proxy servers.
Entity Headers
The entity headers contain information about the object sent in the HTTP transaction. See the anchor module, for the storage of entity headers. This flag defines which headers are to be sent in a request together with an entity body. All headers are optional but the default value is to use as many as possible.
As mentioned, the set of native headers are equivalent to the set of header defined by the HTTP/1.1 protocol specification. The Library also provides functionality for registering additional headers which we will have a look at in the next section.

Generating Additional Headers

The API for handling extra headers is provided by the Header Manager. The API is built in exactly the same way as we have seen in section Prefs.html, that is it uses lists of objects as the main component. This time the elements registered is callback functions which the application provides the definition of. Each time a request is to be generated, the Library looks to see if a list of callback functions has been registered to provide additional metainformation to send along with the request. If this is the case then each of these callback functions will be called in turn and the resulting request is then the sum of the original response and the information provided by the callback functions.

It should be mentioned, however, that this API is simple to use if you have a relative small amount of extra metainformation to provide and that it easily fits into an existing protocol. It is not suited for building entire new protocols, or to provide a massive amount of new information. In this case you need a more powerful model which the Library also provides: building your own stream. Actually this is exactly the way the the Library implements large parts of itself, but it requires normally a bit more work before you can get an application pout together.

Let us jump right in to it and have a closer look at the API. Exactly as for the request preferences you can add and delete an element, which in this case is a callback function. This function has a special definition which is given by

typedef int HTPostCallback (HTRequest *request, HTStream * target);
We have already seen the Request object before, but the Stream object is new. Or actually it isn't, it has just not been mentioned explicitly in the previous sections. We will hear a lot more about the stream object later in this guide. For now it is sufficient to know that a stream i an object that accepts streams of characters - much like an ANSI file stream object does. The return value of the callback function is currently not used but is reserved for future use. We can register a callback function of type HTPostCallback by using the following function:
extern BOOL HTGenerator_add (HTList * gens, HTPostCallback * callback);
The first argument is the well-known list object and the second is the address of the function that we want to be called each time a request is generated. When the callback function is called by the Library it must generate its metainformation and send it down the stream which eventually will end up on the network as part of the final request. In exactly the same way you can unregister a callback function at any time by calling the following function:
extern BOOL HTGenerator_delete (HTList * gens, HTPostCallback * callback);

Header Parsing

The MIME parser stream parses MIME metainformation, for example generated by MIME-like protocols, such as HTTP, NNTP, and soon SMTP as well. For HTTP it handles all headers as defined in HTTP/1.1 of the specification. When a MIME header is parsed, the obtained metainformation about the document is stored in the anchor object where it can be accessed by the application using the methods of the Anchor module. The metainformation in an anchor object can also be used to describe a data object that is to be sent to a remote location, for example using HTTP or NNTP, but we will describe this in more detail later in this guide. In this case the order is reversed as the application provides the metainformation and the appropriate headers are generated instead of generating the entries in the anchor object by parsing the headers.

Parsing Known Headers

The set of headers directly handled by the internal MIME parser is the reader is referred to the actual implementation in order to see the exact list. However, some of the more special headers are:

Allow
Builds a list of allowed methods for this entity
ContentEncoding
ContentLanguage
Builds a list of natural languages
ContentLength
This parameter is now passed
ContentType
The ContentType header now support the charset parameter and the level parameter, however none of them are used by the HTML parser
Date, Expires, RetryAfter, and LastModified
All date and time headers are parsed understanding the following formats: RFC 1123, RFC 850, ANSI C's asctime(), and delta time. The latter is a non-negative integer indicating seconds after the message was received. Note, that it is always for the application to issue a new request as a function of any of the date and time headers..
DerivedFrom, Version
For handling version control when managing collaborative works using HTTP.

Parsing Additional Headers

In many cases, if you have registered an extra set of headers to be generated, you are also in a situation where you would like to handle the result that is returned by the remote server. As we will describe in this section, the Library provides a very similar interface to the one presented above for generating extra headers.

Again, the API for handling extra headers is provided by the Header Manager and is based on managing list objects, just like we have seen many times before. Each time a request is received, and a unknown header is encountered by the internal MIME parser, the Library looks to see if a list of callback functions has been registered to parse additional metainformation. In case a parser is found for this particular header, the call back is called with the header and all parameters that might follow it. As MIME headers can contain line wrappings, the MIME parser canonicalizes the header line before the callback function is called which makes the job easier for the callback function.

Exactly as for the header generators you can add and delete an element, which also in this case is a callback function. This function has a special definition which is given by

typedef int HTParserCallback (HTRequest * request, CONST char * token);
The request object is the current request being handled and the token is the header that was encountered together with all parameters following it. The callback can return a value to the Library by using the return code of the callback function. Currently there are two return values recognized by the Library: While in the callback function, the application can start other requests or even kill the current request if required. We can register a callback function by using the following function:
extern BOOL HTParser_add (HTList *		parsers,
			  CONST char *       	token,
			  BOOL			case_sensitive,
			  HTParserCallback *	callback);
Again, the first argument is a list as we have seen before. The token is a specific token by which the callback function should be called. This token can contain a wild card (*) which will match zero or more arbitrary characters. You can also specify whether the token should be matched using a case sensitive or case insensitive matching algorithm. Let's look at an example of how to register a parser callback function:
HTParser_add(mylist, "PICS-*", NO, myparser);
This registers the myparser function as being capable of handling all tokens starting with "PICS", "PiCs", "pics", for example:
	PICS-start
	pics-Token
	PICS
As for header generators, you can unregister a callback function by using the following function:
extern BOOL HTParser_delete (HTList * parsers, CONST char * token);

Enabling Preferences

Exactly as for Request Preferences, all we have done until now is to show how we can register sets of preferences. However, we still need to define where and when to actually let the Library use the preferences. Again, this can be done in two ways: Globally or locally. When assigning a set of preferences, for example the set of natural languages, it can either be assigned to all future requests (globally) or to a specific request (locally). The preferences can also partly be assigned globally and partly locally so that the most common preferences are registered globally and only some preferences specific to a single request is then added by registering the sets locally.

Here we will only show how to handle the global registration as the local registration is part of the description of the request object.

Additional Header Parsers

extern void HTHeader_setParser (HTList * list);
extern HTList * HTHeader_parser (void);

Additional Header Generatores

extern void HTHeader_setGenerator (HTList * list);
extern HTList * HTHeader_generator (void);

Cleaning up Preferences

As for request preferences, the application is responsible for setting up the sets of additional header generation and parsing, and it is also responsible for deleting them once they are not needed anymore, for example when the application i s closing down, or the user has changed them. The Library provides two mechanisms for cleaning up old lists: It can either be done by invoking separate methods on each set of preferences, or it can be done in a batch of all globally registered preferences or all locally registered preferences relative to a single request. In this context, a batch is the total set of registered converters, encoder, character sets, and languages. Here we will only show how to cleanup preferences set-wise and as a globally batch of preferences. We leave the local cleanup until we have described the request object later in this guide.

As for the other deletion methods, when they have been called you can nor more use the lists as they are not pointing to valid places in the memory. The first mechanism for cleaning up lists is by calling the cleanup method of each preference as indicated below:

Header Parsers

extern BOOL HTParser_delete (HTList * parsers, CONST char * token);
extern BOOL HTParser_deleteAll (HTList * parsers);

Header Generators

extern BOOL HTGenerator_delete (HTList * gens, HTPostCallback * callback);
extern BOOL HTGenerator_deleteAll (HTList * gens);
The easy way of cleaning up all global lists at once is calling the following function
extern void HTHeader_deleteAll (void);


Henrik Frystyk, libwww@w3.org, December 1995
User Guide - Request Callback functions W3C Lib Using

Request Callback functions

As we have seen in the previous chapters, the core part of the Library knows nothing about how to access, for example, a HTML document from a remote server. All this depends on what the application has registered. In many situations there are a number of things to do before a request is actually sent over the wire. For example we might already have the document in a cache or we might have some translation of the URL so that we don't go directly to the remote server. The latter case includes redirection of a request to go through a proxy server or a gateway. Likewise, when a request is terminated, we might want to keep a log of the request and the result, update history lists etc.

The Library does provide a large amount of such pre- and post processing modules. However, the exact amount used by an application depends on the purpose of the application. Simple script-like applications typically do not need any history mechanism etc. Therefore these modules are not a part of the core but instead they can be registered as all other preferences. The Net Manager provides functionality for registering a set of callback functions that can be called before and after a request has been executed. Of course, the result of a pre-processing might be that the request does not have to be executed at all in which case the request can be terminated before the protocol module is called to execute the request.

Generic Handling of Callbacks

The registration of callback functions is handled by the HTNet Manager and it is (of course) based on lists as we have seen so many times before. A callback function can be added to a list by using the following function:
extern BOOL HTNetCall_add (HTList * list, HTNetCallback * cbf, int status);
The callback function has to be of type HTNetCallback which is defined as
typedef int HTNetCallback (HTRequest * request, int result);
This means that a callback function is called with the current request object and the result of the request. Now, if the callback is registered as a pre callback then we obviously do not yet have a result and the functions is called with the code HT_OK. However, if it is a post callback function then the result code may take any of the following values:
HT_ERROR
An error occured
HT_INTERRUPTED
The request was interrupted
HT_LOADED
The document was loaded
HT_NO_DATA
OK, but no data
HT_RETRY
Retry request after at a later time
HT_PERM_REDIRECT
The request has been permanently redirected and we send back the new URL
HT_TEMP_REDIRECT
The request has been temporarily redirected and we send back the new URL
HT_NO_ACCESS
The request could not be fulfilled because it didn't contain sufficient credentials
When a callback function is registered, it may be registered with a status code for which it is to be called. This means that there may be different callback functions to handle error situations, redirections etc. The status code may also take any of the values above, or HT_ALL if it is to be called always.

A callback function may return any code it likes, but IF the return code is different than HT_OK, then the callback loop is stopped. If we are in the before loop and a function returns anything else than HT_OK then we immediately jump to the after loop passing the last return code from the before loop.

Likewise, a callback function can be removed from a list using the following function:

extern BOOL HTNetCall_delete (HTList * list, HTNetCallback *cbf);
or if you simply want to remove all functions from a list then you can use
extern BOOL HTNetCall_deleteAll (HTList * list);

Pre-Request Callbacks

When a request has been issued there are a number of things that an application might want to do with the request before it actually goes on the wire, for example to talk to a remote HTTP server. Examples are checking if the object already is kept in a cache managed by the application, if the request should be redirected to a proxy or a gateway, or there is some other kind of translation of the URL taking place. The Library provides a variety of modules that handles many common place translations such as redirection of URLs and caching. The full list of modules are:

Rule File Management

An application can be setup by using a rule file, also known as a configuration file. This is for example the case with the W3C httpd and the W3C Line Mode Browser. The Rules module provides basic support for configuration file management and the application can use this is desired. The module is not referred to by the Library. Reading a rule file is implemented as a stream converter so that a rule file can come from anywhere, even across the network!

Proxies and Gateways

Applications do not have to provide native support for all protocols, they can in many situations rely on the support of proxies and gateways to help doing the job. Proxy servers are often used to carry client requests through a firewall where they can provide services like corporate caching and other network optimizations. Both Proxy servers and gateways can serve as "protocol translators" which can convert a request in the main Web protocol, HTTP, to an equivalent request in another protocol, for example NNTP, FTP, or Gopher. In a later section we will see how to set up the Proxy servers and gateways using the Proxy module.

As a HTTP request looks different when it is directed to a proxy server than to a origin server, the HTTP module needs to know whether it is talking to a proxy for this particular request or not. You can specify in a request object whether a proxy is being used or not by using the following methods:

extern void HTRequest_setProxing (HTRequest * request, BOOL proxying);
extern BOOL HTRequest_proxying (HTRequest * request);

Cache Manager

Caching is a required part of any efficient Internet access applications as it saves bandwidth and improves access performance significantly in almost all types of accesses. The Library supports two different types of cache: The memory cache and the file cache. The two types differ in several ways which reflects their two main purposes: The memory cache is for short term storage of graphic objects whereas the file cache is for intermediate term storage of data objects. Often it is desirable to have both a memory and a file version of a cached document, so the two types do not exclude each other. The HTCache module provides a basic cache that can be used by an application.

Register a list of BEFORE Callbacks

Until now, we have only described how to build a list of callback functions. We will now describe how to setup a list as either a pre processing set of callback function (the BEFORE loop, or a post processing set (AFTER loop). A set of callback functions can be registered to be called before the request is started by using the following function:
extern BOOL HTNet_setBefore	(HTList * list);
In many cases you know when you register a callback function that this is a function that you always want to be called when either a request starts up or terminates. In the former case you can simply register the callback directly using the following function:
extern BOOL HTNetCall_addbefore	(HTNetCallback *cbf, int status);

Post-Request Callbacks

When a request is terminated, the application often has to do some action as a result of the request (and of the result of the request). The Application part of the Library provides two following modules to handle logging and history management. You can register a POST request handler in the Net Manager as described in the User's Guide. The set of modules provided by the Library is:

Logging

Often it is required to log the requests issued to the Library. This can either be the case if the application is a server or it can also be useful in a client application. The HTLog Module provides a simple logging mechanism which can be enabled if needed.

History Management

Another type of logging is keeping track of which documents a user has visited when browsing along on the Web. The HTHistory module provides a basic set of functionality to keep track of multiple linear history lists.

Register a list of AFTER Callbacks

The registration of a set of callback functions to be called when a request has terminated is handled in very much the same way:
extern BOOL HTNet_setAfter	(HTList * list);
extern BOOL HTNetCall_addBefore (HTNetCallback *cbf, int status);


Henrik Frystyk, libwww@w3.org, December 1995
User Guide - Error Messages W3C Lib Using

Error Messages

The Library deals with two categories of errors: user errors and application errors. The former are all the errors that can occur when the Library is used in a "real" environment where network connections can disappear, disks can get full, and memory can get exhausted. The letter type of errors are at a lower level including invalid arguments passed to procedures, missing function calls, or any other application misbehavior. In this section we will examine the Library Error Manager which handles user errors. In a later section we will have a look at how to control the application errors at an application debug level using trace messages.

As a part of the core Library, the error object is intended to pass information about errors and messages occuring in the Library back to the application. Each error is kept as an object so multiple errors can be nested together using the well-known HTList object. Nested error management can be used to build complicated error messages which an arbitrary level of details, for example:

	This URL could not be retrieved: http://www.foo.com
	    Reason: The host name could not be resolved
	        Reason: DNS service is not available
The principle behind the error manager is exactly like any other registration module in the Library in that it creates an object and binds it to a list that the caller provides. Often, errors are related to a specific request object and each request object will therefore keep its own list of errors. However, errors can also be maintained as separate lists which are not directly related to a request, for example, the application can keep its own list of errors independent of any Library errors.

Errors are roughly categorized into two classes: system errors and other errors. System errors include all errors that occur while interacting with the operating system. Often these errors occurs as a result of insufficient availability or authentication to a system resource. In many operating systems, the system provides a set of error messages which is associated with an error code made available to the application via the errno variable or equivalent. All other errors are registered with an error message belonging to the Library Error manager. Note, that there are no difference in how system errors and other errors are treated, they are the same data objects and can be registered together with no exception.

Registering Errors

Now let's take a look at how a generic error list is maintained. Normal errors can be registered using the following function:
extern BOOL HTError_add		(HTList *	list,
				 HTSeverity	severity,
				 BOOL		ignore,
				 int		element,
				 void *		parameter,
				 unsigned int	length,
				 char *		where);
The first argument is a list object and as always, we need to create a list object using the HTList_new method. The next element is an indication of how serious the error is in the situation where it occured. Classification of errors are known from many operating systems, for example VMS, and it gives the application the opportunity to decide whether the current operation should be continued or aborted. The Library provides four severity categories:
typedef enum _HTSeverity {
    ERR_FATAL,
    ERR_NON_FATAL,
    ERR_WARN,
    ERR_INFO
} HTSeverity;
It is not always that an error is an error immediately when it occurs. In some situations it might first become an error later in the process depending on the outcome of other factors - or it might be circumvented so that no special action is required. The ignore flag provides this functionality in that an error can be registered at any time with the notion: "Register this error but ignore it for now".

The element argument is an index into a table of all error messages. This table is maintained in the HTError Module and contains an error message together with a URl that might be included in an error message presented to the user. The values of the element argument itself is given by the HTErrorElement enumeration definition in the HTEvntrg Module.

The next two arguments are used to register any parameters associated with the error. This can for example be the file name of a file which could not be opened, a URL which could not be accessed etc. By letting the parameter be a void pointer together with a length indication, the parameter can be an arbitrary data object. The last argument is a location description to indicate where the error occured. Often this is the name of the function or a module.

One thing, we didn't mention when describing the request object was that the Request Object provides a similar function for directly associating an error object with a request object. These functions uses request objects and not a list as the basic data object and hence the caller does not have to worry about creating or assigning the list to the request object; this is done automatically. The request version of how to register an error looks very much like its more generic companion, and it should not be necessary to explain the arguments any further.

extern BOOL HTRequest_addError (HTRequest * 	request,
				HTSeverity	severity,
				BOOL		ignore,
				int		element,
				void *		par,
				unsigned int	length,
				char *		where);
System errors can be registered in very much the same way as described above, but the set of parameters is a bit smaller and hopefully a bit easier to handle. The registration function is defined as:
extern BOOL HTError_addSystem (HTList *		list,
			       HTSeverity 	severity,
			       int		errornumber,
			       BOOL		ignore,
			       char *		syscall);
The only difference is the errornumber argument which, as described above, in many situations is provided by the operating system, for example as a errno variable. The syscall is simply the name of the function. Also this function has a mirror function in the HTRequest object, and again they look very much alike:
extern BOOL HTRequest_addSystemError (HTRequest * 	request,
				      HTSeverity 	severity,
				      int		errornumber,
				      BOOL		ignore,
				      char *		syscall);
Let's take a look at two examples of registering errors. The first example registers an informational error message explaining that the HTTP module received a redirection notification from the remote HTTP server. The first example uses the Request versions of the error registration functions, and the second example uses the generic versions:
BOOL HTTPRedirect (HTRequest * request, int status, char * location)
{
    if (location) {
	if (status == 301) {
	    HTRequest_addError(request, ERR_INFO, NO, HTERR_MOVED,
			       location, strlen(location), "HTTPRedirect");
	} else if (status == 302) {
	    HTRequest_addError(request, ERR_INFO, NO, HTERR_FOUND,
			       location, strlen(location), "HTTPRedirect");
	}
	return YES;
    } else {
	HTRequest_addError(request, ERR_FATAL, NO, HTERR_BAD_REPLY,
			   NULL, 0, "HTTPRedirect");
	return NO;
    }
}
The second example shows how to register a system error:
BOOL HTReadDir (HTRequest * request, const * directory)
{ 
    DIR *dp;
    if ((dp = opendir(directory))) {
	STRUCT_DIRENT * dirbuf;
	while ((dirbuf = readdir(dp))) {

	    /* Read Directory */

	}
	closedir(dp);
	return YES;
    } else {
	HTError_addSystem(errorlist,  ERR_FATAL, errno, NO, "opendir");
	return NO;
    }
}

Error Messages

Until now we have concentrated on how to register a set of errors in a list and how to associate errors with a request object. Another important thing about errors is that they often are to be presented to the user. The error manager can be configured to show almost any combination of the parameters in an error object, and all the flags are put together in a big enumeration:
typedef enum _HTErrorShow {
    HT_ERR_SHOW_FATAL,		/* Show only fatal errors */
    HT_ERR_SHOW_NON_FATAL,	/* Show non fatal and fatal errors */
    HT_ERR_SHOW_WARNING,	/* Show warnings, non fatal, and fatal errors */
    HT_ERR_SHOW_INFO,		/* Show all of errors */
    HT_ERR_SHOW_PARS,		/* Show any parameters (if any) */
    HT_ERR_SHOW_LOCATION,	/* Show the location where the error occured */
    HT_ERR_SHOW_IGNORE,		/* Show errors even if they are ignored */
    HT_ERR_SHOW_FIRST,		/* Show only the first registered error */
    HT_ERR_SHOW_LINKS		/* Show any HTML links (if any) */
    HT_ERR_SHOW_DEFAULT,	/* Default level of details *
    HT_ERR_SHOW_DETAILED,	/* Somewhat detailed level */
    HT_ERR_SHOW_DEBUG,		/* Very detailed */
} HTErrorShow;
The last three entries in the enumeration list are only for the convenience of the application. They provide some useful default values for how error messages can be presented to the user. The setup can be modified using the following functions:
extern HTErrorShow HTError_show (void);
extern BOOL HTError_setShow (HTErrorShow mask);
The actual generation of error messages often involves a platform dependent interface including special windows etc. In order to keep the error manager itself completely platform independent, the error presentation functionality is part of the Messaging Module which is described in detail later in this guide.

Data Methods

The Error manager contains a large set of configuration options and methods for accessing information about registered lists. This guide is not intended to describe every single public function, so here we will only present the methods in a list. Often they are self explanatory, so you can probably get a clue of what is going on anyway!
extern BOOL HTError_doShow		(HTError * info);
extern BOOL HTError_ignoreLast		(HTList * list);
extern BOOL HTError_setIgnore		(HTError * info);
extern int HTError_index		(HTError * info);
extern HTSeverity HTError_severity	(HTError * info);
extern int HTError_parameter		(HTError * info, void *parameter);
extern CONST char * HTError_location	(HTError * info);

Cleaning up Errors

In case you are using the generic error interface (HTError_add and HTError_addSystem), the cleanup is done exactly like for all other list based registration mechanisms in the Library. In case you are using the request specific version, the request manager both handles creating and deletion of error lists, so you do not have to do anything. The generic interface for cleaning up looks like:
extern BOOL HTError_deleteAll (HTList * list);
extern BOOL HTError_deleteLast (HTList * list);
In the next section, we will see how to display errors in the application along with other user information such as progress reports etc.


Henrik Frystyk, libwww@w3.org, December 1995
User Guide - Using Streams W3C Lib Using

The Format Manager

MORE
  • setting up request streams
  • memory management
  • examples
  • explain why converters have an input and an output format: chains


Henrik Frystyk, libwww@w3.org, December 1995
Using - Anchors W3C Lib Using

Using Anchors

MORE
  • what do they represent
  • where and when to use
  • How to create and handle
  • relations between links
  • memory management


Henrik Frystyk, libwww@w3.org, December 1995
Using the W3C Reference Library W3C Lib Using

The Request Object

The request object contains all the information needed to define a request the parameters to be used when requesting a resource from the network or local file system. When a request is handled, all kinds of things about it need to be passed along together with a request.

Request a resource

This is an internal routine, which has an address AND a matching anchor. (The public routines are called with one OR the other.)
extern BOOL HTLoad (HTRequest * request, HTPriority priority, BOOL recursive);

Creation and Deletion Methods

The request object is intended to live as long as the request is still active, but can be deleted as soon as it has terminatedk, for example in one of the request termination callback functions as described in the Net Manager. Only the anchor object stays around after the request itself is terminated.

Create new Object

Creates a new request object with a default set of options -- in most cases it will need some information added which can be done using the methods in this module, but it will work as is for a simple request.
extern HTRequest * HTRequest_new (void);

Delete Object

This function deletes the object and cleans up the memory.
extern void HTRequest_delete (HTRequest * request);

Bind an Anchor to a Request Object

Every request object has an anchor associated with it. The anchor normally lives until the application terminates but a request object only lives as long as the request is being serviced.
extern void HTRequest_setAnchor (HTRequest *request, HTAnchor *anchor);
extern HTParentAnchor * HTRequest_anchor (HTRequest *request);

Set the Method

The Method is the operation to be executed on the requested object. The default set if the set of operations defined by the HTTP protocol, that is "GET", "HEAD", "PUT", "POST", "LINK", "UNLINK", and "DELETE" but many of these can be used in other protocols as well. The important thing is to think of the requested element as an object on which you want to perform an operation. Then it is for the specific protocol implementation to try and carry this operation out. However, not all operations can be implemented (or make sense) in all protocols.

Methods are handled by the Method Module, and the default value is "GET".

extern void HTRequest_setMethod (HTRequest *request, HTMethod method);
extern HTMethod HTRequest_method (HTRequest *request);

Update, Reload, or Refresh a Document

The Library has two concepts of caching: in memory and on file. When loading a document, this flag can be set in order to define who can give a response to the request. IMS means that a "If-Modified-Since" Header is used in a HTTP request.
typedef enum _HTReload {
    HT_ANY_VERSION	= 0x0,		/* Use any version available */
    HT_MEM_REFRESH	= 0x1,		/* Reload from file cache or network */
    HT_CACHE_REFRESH	= 0x2,		/* Update from network with IMS */
    HT_FORCE_RELOAD	= 0x4		/* Update from network with no-cache */
} HTReload;

extern void HTRequest_setReloadMode (HTRequest *request, HTReload mode);
extern HTReload HTRequest_reloadMode (HTRequest *request);

Max number of Retrys for a Down Load

Automatic reload can happen in two situations:
  • The server sends a redirection response
  • The document has expired
In order to avoid the Library going into an infinite loop, it is necessary to keep track of the number of automatic reloads. Loops can occur if the server has a reload to the same document or if the server sends back a Expires header which has already expired. The default maximum number of automatic reloads is 6.
extern BOOL HTRequest_setMaxRetry (int newmax);
extern int  HTRequest_maxRetry (void);
extern BOOL HTRequest_retry (HTRequest *request);

Retry Request After

Some services, for example HTTP, can in case they are unavailable at the time the request is issued send back a time and date stamp to the client telling when they are expected to back online. In case a request results in a HT_RETRY status, the application can use any time indicated in this field to retry the request at a later time. The Library does not initiate any request on its own - it's for the application to do. The time returned by this function is in calendar time or -1 if not available.
extern time_t HTRequest_retryTime (HTRequest * request);

Accept Headers

The Accept family of headers is an important part of HTTP handling the format negotiation. The Library supports both a global set of accept headers that are used in all HTTP requests and a local set of accept headers that are used in specific requests only. The global ones are defined in the Format Manager.

Each request can have its local set of accept headers that either are added to the global set or replaces the global set of accept headers. Non of the headers have to be set. If the global set is sufficient for all requests then this us perfectly fine. If the parameter "override" is set then only local accept headers are used, else both local and global headers are used.

Content Types

Th local list of specific conversions which the format manager can do in order to fulfill the request. It typically points to a list set up on initialisation time for example by HTInit(). There is also a global list of conversions which contains a generic set of possible conversions.
extern void HTRequest_setFormat	(HTRequest *request, HTList *type, BOOL override);
extern HTList * HTRequest_format (HTRequest *request);

Content Encodings

The list of encodings acceptable in the output stream.
extern void HTRequest_setEncoding (HTRequest *request, HTList *enc, BOOL override);
extern HTList * HTRequest_encoding (HTRequest *request);

Content-Languages

The list of (human) language values acceptable in the response. The default is all languages.
extern void HTRequest_setLanguage (HTRequest *request, HTList *lang, BOOL override);
extern HTList * HTRequest_language (HTRequest *request);

Charset

The list of charsets accepted by the application
extern void HTRequest_setCharset (HTRequest *request, HTList *charset, BOOL override);
extern HTList * HTRequest_charset (HTRequest *request);

Handling Metainformation (RFC822 Headers)

The Library supports a large set of headers that can be sent along with a request (or a response for that matter). All headers can be either disabled or enabled using bit flags that are defined in the following.

General HTTP Header Mask

There are a few header fields which have general applicability for both request and response mesages, but which do not apply to the communication parties or theentity being transferred. This mask enables and disables these headers. If the bit is not turned on they are not sent. All headers are optional and the default value is NO GENERAL HEADERS
typedef enum _HTGnHd {
    HT_DATE		= 0x1,
    HT_FORWARDED	= 0x2,
    HT_MESSAGE_ID	= 0x4,
    HT_MIME		= 0x8,
    HT_NO_CACHE		= 0x10					   /* Pragma */
} HTGnHd;

#define DEFAULT_GENERAL_HEADERS		0

extern void HTRequest_setGnHd (HTRequest *request, HTGnHd gnhd);
extern void HTRequest_addGnHd (HTRequest *request, HTGnHd gnhd);
extern HTGnHd HTRequest_gnHd (HTRequest *request);

Request Headers

The request header fields allow the client to pass additional information about the request (and about the client itself) to the server. All headers are optional but the default value is all request headers if present except From and Pragma.
typedef enum _HTRqHd {
    HT_ACCEPT_TYPE	= 0x1,
    HT_ACCEPT_CHAR	= 0x2,
    HT_ACCEPT_ENC	= 0x4,
    HT_ACCEPT_LAN	= 0x8,
    HT_FROM		= 0x10,
    HT_IMS		= 0x20,
    HT_ORIG_URI		= 0x40,
    HT_REFERER		= 0x80,
    HT_USER_AGENT	= 0x200
} HTRqHd;

#define DEFAULT_REQUEST_HEADERS \
HT_ACCEPT_TYPE+HT_ACCEPT_CHAR+HT_ACCEPT_ENC+HT_ACCEPT_LAN+HT_REFERER+HT_USER_AGENT

extern void HTRequest_setRqHd (HTRequest *request, HTRqHd rqhd);
extern void HTRequest_addRqHd (HTRequest *request, HTRqHd rqhd);
extern HTRqHd HTRequest_rqHd (HTRequest *request);

Entity Header Mask

The entity headers contain information about the object sent in the HTTP transaction. See the Anchor module, for the storage of entity headers. This flag defines which headers are to be sent in a request together with an entity body. All headers are optional but the default value is ALL ENTITY HEADERS IF PRESENT
typedef enum _HTEnHd {
    HT_ALLOW		= 0x1,
    HT_CONTENT_ENCODING	= 0x2,
    HT_CONTENT_LANGUAGE	= 0x4,
    HT_CONTENT_LENGTH	= 0x8,
    HT_CTE		= 0x10,			/* Content-Transfer-Encoding */
    HT_CONTENT_TYPE	= 0x20,
    HT_DERIVED_FROM	= 0x40,
    HT_EXPIRES		= 0x80,
    HT_LAST_MODIFIED	= 0x200,
    HT_LINK		= 0x400,
    HT_TITLE		= 0x800,
    HT_URI		= 0x1000,
    HT_VERSION		= 0x2000
} HTEnHd;

#define DEFAULT_ENTITY_HEADERS		0xFFFF			      /* all */

extern void HTRequest_setEnHd (HTRequest *request, HTEnHd enhd);
extern void HTRequest_addEnHd (HTRequest *request, HTEnHd enhd);
extern HTEnHd HTRequest_enHd (HTRequest *request);

Referer Field

If this parameter is set then a `Referer: <parent address> can be generated in the request to the server, see HTTP Protocol
extern void HTRequest_setParent (HTRequest *request, HTParentAnchor *parent);
extern HTParentAnchor * HTRequest_parent (HTRequest *request);

Extra Headers

Extra header information can be send along with a request using this variable. The text is sent as is so it must be preformatted with <CRLF> line terminators. This will get changed at some point so that you can register a header together with a handler in the MIME parser.
extern void HTRequest_setExtra (HTRequest *request, char *extra);
extern char *HTRequest_extra (HTRequest *request);

Streams From Network to Application

Default Output Stream

The output stream is to be used to put data down to as they come in from the network and back to the application. The default value is NULL which means that the stream goes to the user (display).
extern void HTRequest_setOutputStream (HTRequest *request, HTStream *output);
extern HTStream *HTRequest_OutputStream (HTRequest *request);
The desired format of the output stream. This can be used to get unconverted data etc. from the library. If NULL, then WWW_PRESENT is default value.
extern void HTRequest_setOutputFormat (HTRequest *request, HTFormat format);
extern HTFormat HTRequest_OutputFormat (HTRequest *request);

Debug Stream

All object bodies sent from the server with status codes different from 200 OK will be put down this stream. This can be used for redirecting body information in status codes different from "200 OK" to for example a debug window. If the value is NULL (default) then the stream is not set up.
extern void HTRequest_setDebugStream (HTRequest *request, HTStream *debug);
extern HTStream *HTRequest_DebugStream (HTRequest *request);
The desired format of the error stream. This can be used to get unconverted data etc. from the library. The default value if WWW_HTML as a character based only has one WWW_PRESENT.
extern void HTRequest_setDebugFormat (HTRequest *request, HTFormat format);
extern HTFormat HTRequest_DebugFormat (HTRequest *request);

Context Swapping

In multi threaded applications it is often required to keep track of the context of a request so that when the Library returns a result of a request, it can be put into the context it was in before the request was first passed to the Library. This call back function allows the application to do this.
typedef int HTRequestCallback (HTRequest * request, void *param);

extern void HTRequest_setCallback (HTRequest *request, HTRequestCallback *cb);
extern HTRequestCallback *HTRequest_callback (HTRequest *request);
The callback function can be passed an arbitrary pointer (the void part) which can describe the context of the current request object. If such context information is required then it can be set using the following methods:
extern void HTRequest_setContext (HTRequest *request, void *context);
extern void *HTRequest_context (HTRequest *request);

Preemptive or Non-preemptive Access

A access scheme is registered with a default for using either preemptive (blocking I/O) or non-premitve (non-blocking I/O). This is basically a result of the implementation of the protocol module itself and is explained in the section Registering Access Schemes. However, if non-blocking I/O is the default then some times it is nice to be able to set the mode to blocking instead. For example when loading the first document (the home page) then blocking can be used instead of non-blocking.
extern void HTRequest_setPreemptive (HTRequest *request, BOOL mode);
extern BOOL HTRequest_preemptive (HTRequest *request);

Format Negotiation

When accessing the local file system, the Library is capable of performing content negotioation as described by the HTTP protocol. This is mainly for server applications, but some client applications might also want to use content negotiation when accessing the local file system. This method enables or disables content negotiation - the default value is ON.
extern void HTRequest_setNegotiation (HTRequest *request, BOOL mode);
extern BOOL HTRequest_negotiation (HTRequest *request);

Error Manager Information

The error manager keeps a list (called an error stack) of all errors and warnings occured during a request. The list of errors can be accessed for generating an error message by the following function.
extern HTList *HTRequest_errorStack (HTRequest *request);

Bytes Read in Current Request

This function returns the bytes read in the current request. For a deeper description of what the current request is, please read the user's guide. This function can be used in for example the HTAlert module to give the number of bytes read in a progress message.
extern long HTRequest_bytesRead(HTRequest * request);


Henrik Frystyk, libwww@w3.org, December 1995
Using the W3C Reference Library W3C Lib Using

The Access Manager

At this point most of the design issues have been addressed and the Library it is now possible to use the Library to exchange information between the application an the Internet. The Library provides a set of functions that can be used to request a URI either on a remote server or on the local file system. The access method binds the URL with a specific protocol module as described in section Access Methods and the stream chains defines the data flow for incoming and outgoing data.

Searching a URL

MORE

Receiving an Entity

MORE

Sending an Entity

MORE

Return Codes from the Access manager

The access manager has a standard set of return codes that the application can use for diagnostics. They should only be used as indications of the result as a more detailed description of any error situation is registered in the error handler. The set of codes are:

HT_LOADED
A generic success code that indicates that the request has been fulfilled
HT_NO_DATA
Partly a success code, but no document has been retrieved and a client application is encouraged to maintain the previous document view as the current view. A HT_NO_DATA code might be the result when a telnet session is started etc.
HT_ERROR
An error has occured and the request could not be fulfilled
HT_RETRY
The remote server is temporarily unavailable and no more requests should be issued to the server before the calendar time indicated in HTRequest->retry_after field. No action is taken by the Library to automatically retry the request, this is uniquely for the application to decide.
HT_WOULD_BLOCK
An I/O operation would block and the request must pause. As the request is not yet terminated, the operation will continue at a later time when the blocking situation has ceased to exist.

Context Swapping

In a multithreaded environment it is necessary to keep track of the context of each simultanous requests issued to the Library as the response might return in another order than the one they were issued. The Library allows such a context registration in the HTRequest object by providing the registration mechanism of a call back function and a pointer to an arbitrary data object to be passed to that call back function.


Henrik Frystyk, libwww@w3.org, December 1995
User Guide - Event Handling W3C Lib Using

The Event Manager

The W3C Reference Library can be used in either a single-threaded or the multi-threaded programming style. In this section we will havea look at how to enable this functionality and what the API is for applications to use it. We will not describe the underlying design model as thsi is described in detail in the Library Architecture documentation.

If you are working on a MSWindows platform then you have the possibility of using asynchronous socket management (proactive mode) instead of typical Unix select based I/O (reactive mode). Please read the Windows documentation for more details.

Event Handlers

The appplication registers a set of event handlers to be used on a specified set of sockets. An event handler is a function of type
typedef int HTEventCallback (SOCKET, HTRequest *, SockOps);

Register a TTY Event Handler

Register the tty (console) as having events. If the TTY is select()-able (as is true under Unix), then we treat it as just another socket. Otherwise, take steps depending on the platform. This is the function to use to register user events!
extern int HTEvent_RegisterTTY	(SOCKET, HTRequest *, SockOps, HTEventCallback *, HTPriority);

Unregister a TTY Event Handler

Unregisters TTY I/O channel. If the TTY is select()-able (as is true under Unix), then we treat it as just another socket.
extern int HTEvent_UnRegisterTTY (SOCKET, SockOps);

Register an Event Handler

For a given socket, reqister a request structure, a set of operations, a HTEventCallback function, and a priority. For this implementation, we allow only a single HTEventCallback function for all operations. and the priority field is ignored.
extern int HTEvent_Register	(SOCKET, HTRequest *, SockOps, HTEventCallback *, HTPriority);

Unregister an Event Handler

Remove the registered information for the specified socket for the actions specified in ops. if no actions remain after the unregister, the registered info is deleted, and, if the socket has been registered for notification, the HTEventCallback will be invoked.
extern int HTEvent_UnRegister	(SOCKET, SockOps);

Unregister ALL Event Handlers

Unregister all sockets. N.B. we just remove them for our internal data structures: it is up to the application to actually close the socket.
extern int HTEvent_UnregisterAll (void);

Handler for Timeout on Sockets

This function sets the timeout for sockets in the select() call and registers a timeout function that is called if select times out. This does only works on NON windows platforms as we need to poll for the console on windows If tv = NULL then timeout is disabled. Default is no timeout. If always=YES then the callback is called at all times, if NO then only when Library sockets are active. Returns YES if OK else NO.
typedef int HTEventTimeout (HTRequest *);

extern BOOL HTEvent_registerTimeout (struct timeval *tp, HTRequest * request,
				     HTEventTimeout *tcbf, BOOL always);

Start the Event Loop

That is, we wait for activity from one of our registered channels, and dispatch on that. Under Windows/NT, we must treat the console and sockets as distinct. That means we can't avoid a busy wait, but we do our best.
extern int HTEvent_Loop (HTRequest * request);

Stop the Event Loop

Stops the (select based) event loop. The function does not guarantee that all requests have terminated. This is for the app to do
extern void HTEvent_stopLoop (void);


Henrik Frystyk, libwww@w3.org, December 1995
User Guide - Application modules W3C Lib Using

Application modules

Until now we have described the Library core and the utilities. You might think: "This is all OK, but how can I get the Library to do something, for example accessing a remote HTTP server?". In the Library distribution file, you will find many modules that actually can do this and much more. However, they are not part of the core but of the application. The application can register the set of modules that provides the desired functionality which is basically what we have seen in the description of the core. This section discusses the application modules and what functionality they provide. The Library provides a special include file called WWWApp.h which includes all the modules mentioned in this section. This include file may be included in an application but is not required.

OK, let's continue and get an overview of the functionality provided by the application modules.


Henrik Frystyk, libwww@w3.org, December 1995
Using the W3C Reference Library W3C Lib Using

The Cache Manager

Caching is a required part of any efficient Internet access applications as it saves bandwidth and improves access performance significantly in almost all types of accesses. The Library supports two different types of cache: The memory cache and the file cache. The two types differ in several ways which reflects their two main purposes: The memory cache is for short term storage of graphic objects whereas the file cache is for intermediate term storage of data objects. Often it is desirable to have both a memory and a file version of a cached document, so the two types do not exclude each other. The following paragraphs explains how the two caches can be maintained in the Library.

Memory Cache

The memory cache is largely managed by the application as it simply consists of keeping the graphic objects described by the HyperDoc object in memory as the user keeps requesting new documents. The HyperDoc object is only declared in the Library - the real definition is left to the application as it is for the application to handle graphic objects. The Line Mode Browser has its own definition of the HyperDoc object called HText. Before a request is processed over the net, the anchor object is searched for a HyperDoc object and a new request is issued only if this is not present or the Library explicitly has been asked to reload the document, which is described in the section Short Circuiting the Cache

As the management of the graphic object is handled by the application, it is also for the application to handle the garbage collection of the memory cache. The Line Mode Browser has a very simple memory management of how long graphic objects stay around in memory. It is determined by a constant in the GridText module and is by default set to 5 documents. This approach can be much more advanced and the memory garbage collection can be determined by the size of the graphic objects, when they expire etc., but the API is the same no matter how the garbage collector is implemented.

File Cache

The file cache is intended for intermediate term storage of documents or data objects that can not be represented by the HyperDoc object which is referenced by the HTAnchor object. As the definition of the HyperDoc object is done by the application there is no explicit rule of what graphic objects that can not be described by the HyperDoc, but often it is binary objects, like images etc.

The file cache in the Library is a very simple implementation in the sense that no intelligent garbage collection has been defined. It has been the goal to collect experience from the file cache in the W3C proxy server before an intelligent garbage collector is implemented in the Library. Currently the following functions can be used to control the cache, which is disabled by default:

HTCache_enable(), HTCache_disable(), and HTCache_isEnabled()
Use these functions to enable and disable the cache
HTCache_setRoot() and HTCache_getRoot()
Use these functions to set and get the value of the cache root
An important difference between the memory cache and the file cache is the format of the data. In the memory cache, the cached objects are graphic objects ready to be displayed to the user. In the file cache the data objects are stored along with their metainformation so that important header information like Expires, Last-Modified, Language etc. is a part of the stored object.

Mode for Cache Refresh

In situations where a cached document is known to be stale it is desired to flush any existent version of a document in either the memory cache or the file cache and perform a reload from the authoritative server. This can for example be the case if an expires header has been defined for the document when returned from the origin server. Forcing a refresh from either the memory cache, the file cache, or both can be done using the following function:
void HTRequest_setReload (HTRequest *request, HTReload mode);
HTReload HTRequest_reload (HTRequest *request);
where HTReload can be either of the values
HT_ANY_VERSION
Use any version available, either from memory cache or from local file cache
HT_MEM_REFRESH
Non-authoritative update of any version stored in memory. The new version can either come from the local file cache, a proxy cache or the network. If the request falls through to the network, the Library issues a conditional GET using a If-Modified-Since header. There are two main purposes for this mode:
  1. If the disk cache is private to exactly one application then a version stored in the local disk cache does normally not differ in time from a version in memory - they have been created at the same time. However, in a shared cache environment, the two versions can differ and this flag can be used to force an update to the latest version in the file cache.
  2. If the application wants to see the metainformation as received from the network, then the object in the file cache provides this information whereas the version in memory does not.
HT_CACHE_REFRESH
Authoritative update of any version stored in the local file cache or a proxy cache. The Library issues a conditional GET using a If-Modified-Since header and a Pragma: no-cache to ensure that the response is authoritative.
HT_FORCE_RELOAD
Unconditinal reload from the network using the Pragma: no-proxy directive in order to insure that the reload is passed to any proxy server on the way to the origin server
If the Library receives either an authoritative or non-authoritative "304 Not Modified" response upon any of the requests above, it

Handling Expired Documents

There are various ways of handling Expires header when met in a history list. Either it can be ignored all together, the user can be notified with a warning, or the document can be reloaded automatically. The Libarry supports either way, as it should be up to the user to decide. The default action is HT_EXPIRES_IGNORE, but other modes are to notify the user that a document is stale without reloading it, and to do an automatic relaod of the document. Th functions to use are in this case:
void HTAccess_setExpiresMode (HTExpiresMode mode, char *  notify);
HTExpiresMode HTAccess_expiresMode ();
where HTExpiresMode can take any of the values:
    HT_EXPIRES_IGNORE
    HT_EXPIRES_NOTIFY
    HT_EXPIRES_AUTO


Henrik Frystyk, libwww@w3.org, December 1995
User's Guide - Proxies and Gateways W3C Lib Using

Registering Proxy Servers and Gateways

Applications do not have to provide native support for all protocols, they can in many situations rely on the support of proxies and gateways to help doing the job. Proxy servers are often used to carry client requests through a firewall where they can provide services like corporate caching and other network optimizations. Both Proxy servers and gateways can serve as "protocol translators" which can convert a request in the main Web protocol, HTTP, to an equivalent request in another protocol, for example NNTP, FTP, or Gopher. In case a proxy server or a gateway is available to the application, it can therefore by use of HTTP forward all requests to for example a proxy server which then handle the communications with the remote server, for example using FTP about the document and return it to the application (proxy client) using HTTP.

The Library supports both proxies and gateways through the HTProxy module and all requests can be redirected to a proxy or a gateway, even requests on the local file system. Of course, the Library can also be used in proxy or gateway applications which in terms can use other proxies or gateways so that a single request can be passed through a series of intermediate agents.

There is one main mechanism for registering both proxies and gateways but there are two different APIs to follow. It is free to the application to chose which one suits it the best, the functionality provided by the Library is the same in both cases. The first API is based on a set of registration functions as we have seen it so often through out this guide. Regardless of the registration mechanism used, proxy servers are always rated higher than gateways so if both a proxy server and a gateway is registered for the same access method, the proxy server will be used.

Registration of Proxies

A proxy server is registered with a corresponding access method, for example http, ftp etc. The `proxy' parameter should be a fully valid name, like http://proxy.w3.org:8001 but domain name is not required. If an entry exists for this access then delete it and use the new one.
extern BOOL HTProxy_add		(CONST char * access, CONST char * proxy);
In addition to the proxy list, the Library supports a list of servers for which a proxy should not be consulted. This can be useful in order to avoid going via a proxy server for servers inside a firewall, if the server is known to be either as well connected as the proxy or the remote server is in fact itself a proxy server.
extern BOOL HTNoProxy_add	(CONST char * host, CONST char * access, unsigned port);
The set of server registered using this function are host names and domain names where we don't contact a proxy even though a proxy is in fact registered for this particular access method. When registering a server as a noproxy element, you can specify a specific port for this access method in which case it is valid only for requests to this port. If `port' is '0' then it applies to all ports and if `access' is NULL then it applies to to all access methods. Examples of host names are:
	w3.org
	www.fastlink.com

Registration of Gateways

Gateways are registered exactly like proxy servers: it is registered with a corresponding access method, for example http, ftp etc. The `gate' parameter should be a fully valid name, like http://gateway.w3.org:8001 but domain name is not required. If an entry exists for this access then delete it and use the new one.
extern BOOL HTGateway_add	(CONST char * access, CONST char * gate);

Backwards Compatibility with Environment Variables

Proxy servers and gateways have historically been registered environment variables which is a Unix'ism and not especially portable. However, in order to support this way of registration, the Library provides the following function to read the environment variables that defines proxy severs and gateways.
extern void HTProxy_getEnvVar	(void);
There is no standard for the format of the environment variables, but the most accepted convention is the format described here:
WWW_<access>_GATEWAY
Definition of a gateway. Note that a WAIS gateway can be defined this way to change the default gateway at wais://www.w3.org:8001/.
<access>_proxy
Definition of a proxy server
no_proxy
This is a comma separated list of remote servers where a proxy server should not be consulted for handling the request. An example is
	no_proxy="cern.ch,ncsa.uiuc.edu,some.host:8080"
	export no_proxy
<access> is the specific access scheme and it is case sensitive as access schemes in URIs are case sensitive. Proxy servers have precedence over gateways, so if both a proxy server and a gateway has been defined for a specific access scheme, the proxy server is selected to handle the request.

It is important to note that the usage of proxy servers or gateways is an extension to the binding between an access scheme and a protocol module. An application can be set up to redirect all URLs with a specific access scheme without knowing about the semantics of this access scheme or how to access the information directly. That way, powerful client applications can be built having direct support for, for example, HTTP only.

Finding Proxies and Gateways

Registering a proxy server or a gateway does not mean that the request automatically is redirected to the new location instead of the origin server. In order to actually redirect a request, you can register a Pre-Request Callback function which will bet called before a request is actually sent over the wire. You can find more information on how to actually redirect a request to for example a proxy server in the section Proxies and Gateways


Henrik Frystyk, libwww@w3.org, December 1995
User Guide - Rule Files W3C Lib Using

Rule Files

The W3C Library provides this module for handling configuration files (a.k.a. rule files). Rule files can be used to initialize as much as the application desires including setting up new protocol modules etc. Also the rules file do not have to be a file - it can be a database or any other way of storage information. This implementation is not used by the Library at all and is part of the WWWApp.h interface.

Parsing a whole rule file is done using a converter stream. This means that a rule file can come from anywhere, even across the network. We have defined a special content type for rule files called WWW_RULES in HTFormat.

In some situations, a set of rules comes from a subset of a file or some other origin, for example INI files for X resources. In that case, you can also parse a single line from a rules file using the following function:

extern BOOL HTRule_parseLine (HTList * list, CONST char * config);
You can add a rule to a list of rules as any other preference. The pattern is a string containing a single "*". replace points to the equivalent string with * for the place where the text matched by * goes.
typedef enum _HTRuleOp {
    HT_Invalid, 
    HT_Map, 
    HT_Pass, 
    HT_Fail,
    HT_DefProt,
    HT_Protect,
    HT_Exec,
    HT_Redirect,
    HT_UseProxy
} HTRuleOp;

extern BOOL HTRule_add (HTList * list, HTRuleOp op, CONST char * pattern, CONST char * replace);
And as normal you can delete a set of rules by using this function:
extern BOOL HTRule_deleteAll (HTList *list);

Global Rules

Rules are handled as list as everything else that has to do with preferences. We provide two functions for getting and setting the global rules:
extern HTList * HTRule_global	(void);
extern BOOL HTRule_setGlobal	(HTList * list);

Translate by rules

This function walks through the list of rules and translates the reference when matches are found. The list is traversed in order starting from the head of the list. It returns the address of the equivalent string allocated from the heap which the CALLER MUST FREE. If no translation occured, then it is a copy of the original.
extern char * HTRule_translate (HTList * list, CONST char * token, BOOL ignore_case);


Henrik Frystyk, libwww@w3.org, December 1995
Using the W3C Reference Library W3C Lib Using

The Log Module

It is possible to log the result of a request to the Library whether regardless of what type of application is using the Library. The current implementation of the log manager is very simple but it is straight forward to replace the implementation with a more sofisticated one. The current log format is defined as follows:
	<HOST> <DATE> <METHOD> <URI> <RESULT> <CONTENT LENGTH>
where the date and time stamp can be either in local time or GMT. Logging is turned off but the application can enable it at any time. However, it is also for the application to disable the logging in order to close any open file descriptors etc. The exact log API is described in the Log Manager.


Henrik Frystyk, libwww@w3.org, December 1995
Using the W3C Reference Library W3C Lib Using

Keeping Track of History

The Library supports client applications in keeping track of which locations the user has visited while browsing the Web. The internal history list is implemented in the HTHist module. This module is completely autonomous as it is not used by any other modules in the Library so if it is not referred to in the application code then it will not be linked into the linked application. This means that if the application does not need recording of history then no action is required at all.

The purpose of the history module is to try not to impose any particular history mechanism policy but instead to allow various different history mechanisms. The basic features of the history module are:

  • The module can handle multiple history lists
  • The underlying data model is linear lists
  • The module keeps a position pointer into this list
  • The application can refer to an element in the list by an index
Some of the navigation steps supported by the module are "back", "forward", and jump to a position in the list. The details of the module is listed in the declaration part of the HTHist module.


Henrik Frystyk, libwww@w3.org, December 1995
Using the W3C Reference Library W3C Lib Using

Presentation Modules

The HTML parser has three different levels of APIs in order to make the implementation as flexible as possible. Depending on which API is used by the application, the output can be a stream, a structured stream or a set of callback functions as indicated in the figure below:

HTMLParser

SGML Stream Interface
This interface provides the most basic API consisting of the output from a stream without any form for structure imposed on the data. The internal SGML parser parses the data sequence, identifies SGML markup tags, and passes the information on the the HTML parser. However, if the application has its own SGML parser and HTML parser, the internal parsers can be disabled by removing the internal HTML converter called HTMLPresent() used to present a graphic object on the screen from both the global and the local list of converters and presenters.
HTML Structured Stream Interface
If the application has its own HTML parser that understands the structured output from the internal SGML parser then the second API can be used. The current HTML parser in the Library is very basic and does not understand many of the new features in HTML 2 and 3.
HText Call Back Interface
The last API can be in case the application prefers to use the internal HTML parser and only wants to provide a platform dependent definition of the callback functions defined in the HText module. Now, the parsing is all done internally in the Library and the application is only called with segments of fully parsed HTML. The callback functions are all defined as prototypes in the HText module but the client must provide the actual code that defines the presentation method used for a specific HTML tag.
Due to the limited functionality of the internal HTML parsing module, many applications have chosen to implement their own HTML parser. Therefore many regard the HTML parser module as being an application specific module instead of a dynamic module. This will be alleviated in the next version of the Library, which hopefully will ease the use of the internal HTML parser. The current parser can be overriding as described in section Application Specific Modules.


Henrik Frystyk, libwww@w3.org, December 1995
User Guide - The Library Utilities W3C Lib Using

The Library Utilities

The Library contains a set of basic utility modules which are used throughour all the Library code. The APIs defined by the WWWUtil.h include file provides basic memory management and container modules. You can see a list of the utility modules in the Library Internals


Henrik Frystyk, libwww@w3.org, December 1995
Using - Dynamic Memory management W3C Lib Using

Dynamic Memory Management

The Library makes use of a dynamic memory manager which handles allocation and deallocation of dynamic memory. The methods used for allocation and deallocation are wrapper functions around the native malloc, calloc, realloc, and free methods. Hence, the functionality of teh module is very similar to the native C interface but it allows for structured error recovery and application termination in case of failure. It covers especially the following three situations:
  • Handling of allocation and deallocation of dynamic memory
  • Recovering from temporary lack of available memory
  • Panic handling in case a new allocation fails

Memory Freer Functions

The dynamic memory freer functions are typically functions that are capable of freeing large chunks of memory. In case a new allocation fails, the allocation method looks for any freer functions to call. There can be multriple freer functions and after each call, the allocation method tries again to allocate the desired amount of dynamic memory. The freer functions are called in reverse order meaning that the last one registered gets called first. That way, it is easy to add temporary free functions which then are guaranteed to be called first if a methods fails.

Add a Freer Function

You can add a freer function by using the following method. The Library itself registeres a set of free functions during initialization. If the application does not register any freer functions then the Library looks how it can free internal memory.
typedef void HTMemoryCallback(size_t size);

extern BOOL HTMemoryCall_add (HTMemoryCallback * cbf);

Delete a Freer Function

Freer functions can be deleted at any time in which case they are not called anymore.
extern BOOL HTMemoryCall_delete (HTMemoryCallback * cbf);
extern BOOL HTMemoryCall_deleteAll (void);

Panic Handling

If the freer functions are not capable of deallocation enough memory then the application must have an organized way of closing down. This is done using the panic handler. In the libwww, each allocation is tested and HT_OUTOFMEM is called if a NULL was returned. HT_OUTOFMEM is a macro which calls HTMemory_outofmem. This function calls an exit function defined by the app in a call to HTMemory_setExit. If the app has not defined this function, HTMemory_outofmem TTYPrints the error message and calls exit(1).
typedef void HTMemoryExitCallback(char * name, char * file, unsigned long line);

extern void HTMemory_setExit(HTMemoryExitCallback * pExit);

extern HTMemoryExitCallback * HTMemory_exit(void);


Henrik Frystyk, libwww@w3.org, December 1995
Using - Global Flags W3C Lib Using

Trace Messages and Preprocessor Defines

The Library has a huge amount of trace messages that are very useful when debugging an application. In this section we will have a look at how to use the trace messages and also what preprocessor defines that can be used to modify the behavior of the Library.

Trace Messages

The Library has a huge set of trace messages that can be enabled in various ways. They are often an important help to the application programmer in order to debug an application and this is the reason why they are trated in this User's Guide.

MORE

Preprocessor Defines

Most of the preprocessor defines in the Library are platform dependent that are determined as a result of the BUILD script. However, there are some few defines that on a platform independent basis can change the default behavior of the Library.

HT_REENTRANT
This boolean define should be enabled if the reentrant versions ("*_r") of the system calls should be used. The name of these system calls are currently "*_r", for example strtok_r. The default value is OFF.
HT_SHARED_DISK_CACHE
If the cache can be shared between several clients this will have an effect on the way, update of a document will be done. The default cache implementation of the cache manager does not support this so the default value is NOT defined.
HT_DIRECT_WAIS
This boolean define is enabled by the Makefile.include file as described in section Access Methods. The default value is OFF.
HT_DEFAULT_WAIS_GATEWAY
A constant string value which WAIS gateway to contact if HT_DIRECT_WAIS is not defined and no gateway has been defined using environment variables
HT_FTP_NO_PORT
The FTP module can handle both PASV and PORT when requesting a document from a FTP server. If the application is a proxy server running on top of a firewall machine then PORT is normally not allowed as a firewall does not accept incoming connections on arbitrary ports. This define will disable the use of PORT. The default value is to use PORT if PASV fails.
WWWLIB_SIG
The Library has a very small set of signal handlers whose action most often are simply to ignore the signals. However, due to a bug in the TCP kernel on Solaris and other SVR4 platforms returning a SIG_PIPE signal, some kind of handling is required on these platforms, and the signal handling is enabled by default on these platforms.
HT_TMP_ROOT
The default destination for temporary files if no other destination has been given by the application. Temporary files include files created for external presenters etc. The default value is /tmp which obviously is not suited for large amount of data.
HT_CACHE_ROOT
If the cache is enabled and no cache root directory has been specified then use this as the location. The default value is again /tmp.
HT_NO_RULES
If this flag is enabled then no configuration or rule file is searched for map rules when handling a request even if a rule file has been specified by the application. The default value is OFF
HT_NO_PROXY
If no environment variables are to be searched for gateways or proxies for a request. The default value is OFF


Henrik Frystyk, libwww@w3.org, December 1995