The previous section discusses public annotation authoring in my public annotation system paper.

Public Annotation System Publishing

Making an annotation available to the public is called publishing the annotation.

There are several issues that have to be decided upon prior to implementing a public annotations:

In-line versus Floating: Will the annotations be inserted into specific locations in the original document (i.e. in-line annotations) or will the annotations be inserted at the beginning or end of the original document (i.e. floating annotations.)
Embedded versus Indirect: Will the text of the annotation be displayed along side the original document (i.e. embedded annotations) or will hypertext links to the annotation text be displayed (i.e. indirect annotations.)
Local versus Remote: Where will the publics annotations be store? They can either be stored on the same machine as the original document being annotated (i.e. local annotations) or can be stored on different machine than the original document (i.e. remote annotations.)

(More goes here.)

In-line versus Floating Annotations

While I defined terms in-line and and floating annotation in the introductory section, it can not hurt to reiterate the definitions of these two terms again. I call an annotation that is attached to the original document as a whole a floating annotation; conversely, I call an annotation that is attached to a specific character, word, sentence, paragraph, figure, or section in the original document an in-line annotation. In general, annotation authors want to be able to have both floating and in-line annotations.

So why is there any distinction made between in-line and floating annotations? In general, floating annotations are much easier to implement than in-line annotations. For example, Mosaic currently only supports floating annotations.

So why are in-line annotations harder to implement? The answer to this question depends upon whether the contents of the original document can be changed after an in-line annotation has been attached to it. In fact, if original document can not have its contents modified, in-line annotations are easy to implement, since their location can be recorded as a simple character offset from the beginning of the document. However, if the original document can have its contents modified, the annotation support system must be prepared to deal with having the text to which the annotation is attached be either moved or outright deleted. It is the text movement/deletion problem that causes the implementation of in-line annotations to be more difficult than floating annotations. Since the World Wide Web encourages people to continually update their original documents, any annotation that wishes to support in-line annotations for the Web must have a viable solution to the text movement/deletion problem.

I can think of at least four defensible solutions to the text movement/deletion problem associated with in-line annotation attachments and they are listed below:

Notification Only Solution

The notification only solution is essentially a cop-out. Basically, any time the original document is modified, all in-line annotations are converted to floating annotations and the annotation authors are required to reattach the annotations to the correct location in the original document.

The primary problem with the notification only solution is that it really discourages annotation authors from using in-line annotations at all.

Named Anchor Solution

The anchor solution requires that an HTML named anchor (i.e. <A Name=anchor_name>) be inserted into the original document at the annotation attachment point. Whenever an the original document is updated care is taken to never delete the named anchors.

While the simplicity of the named anchor solution is quite appealing at first, it has the following drawbacks:

The annotation system needs the ability to modify the original document.
Annotations can only be made to HTML documents. Annotation of plain text files is not feasible.
Annotations can not be made while the original document is being modified. This means that there needs to be some sort of synchronization lock that prevents any in-line annotations from being made while the original document is being modified.
Many HTML documents are generated via converter programs that would all have to be modified to preserve the named anchors.

Editor Marker Solution

The editor marker solution is actually a variant of the named anchor solution above.

An editor that supports markers is used to modify the original document. A marker is an invisible character that editor can insert into a document. Sophisticated editors such as EMACS and FrameMaker have marker support. With markers, each annotation attachment point is converted to a marker inside the document, as the document is modified, the editor keeps the markers properly positioned relative to the document text, and finally, when the document is written out the marker positions are read to determine the new annotation attachment positions.

The editor marker solution solves the first two problems of the named anchor solution, but still has the lock synchronization problem and HTML converter problem. In addition, it introduces the problem or requiring document authors to only use a restricted set of editors for document modification.

Pattern Match Solution

The pattern match solution associates a pattern of characters with each in-line annotation. Whenever the original document is modified, it is rescanned to locate the new positions for each in-line annotation.

The pattern match solution has the following problems:

The person modifying the original document, can unintentionally cause an in-line annotation to become attached to the wrong position by inserting the same pattern into the original document more than once.
The annotation author is required to specify a pattern with each in-line annotation.

While I could probably come up with more alternative solutions and variants, there is not much point in doing so, since I decided to go with the pattern match solution. I spend the remainder of the section discussing some problems with the pattern match solution and my solutions to these problems.

A reasonable strategy for supporting in-line annotations is for the user to specify a pattern to search for in the original document. Since HTML is mostly insensitive to the placement of white space, the pattern matching algorithm should also be insensitive to white space. Since HTML mark up is not displayed by Web browsers, the pattern matching algorithm should also be insensitive to HTML mark up.

What happens when the user specifies a pattern that has more than one match? In this case, the user should also specify the number of the match that they meant out of the total number (i.e. third match of seven total matches.)

What happens when the original document is modified such that a particular pattern no longer matches? In this case, the annotation should be turned into an annotation that floats to the top or the bottom of the document. Whenever this occurs, the owner of the public annotation needs to be notified that their annotation become detached. Notification is discussed in the chapter on notification.

Once a pattern has been specified, where is the annotation placed relative to the pattern?

What about entities?

Embedded versus Indirect Annotations

(This needs to be written.)

Local versus Remote Annotations

(This needs to be written.)

My Implementation

My public annotation publishing system consists of the following components:

A modified NCSA HTML server that will serve up an annotated version of a document in preference to the original document, if an annotated document exists.
A CGI program that a remote site can use to cause a public annotation to be added to a document.
A small database per annotated document that keeps track of all public annotations and annotation voting information.
A CGI program that allows a remote user to vote on the relevance of a given public annotation.
A program that is run occasionally from a cron job to verify the continued existence of public annotations. (Not implemented yet!)
A program that lets allows someone to use Mosaic private annotations for public annotation authoring.
A CGI program that allows a user to `register' a document as being available for public annotation.

Each of these components is discussed in greater detail in the sections below.

HTTPD Server Modifications

When I was originally thinking about this project, I was hoping that it would be unnecessary to do any server modifications at all. Unfortunately, as I explored the problem space I discovered that there were some very unattractive characteristics associated with a solution that did not use server modifications. I will discuss solutions that did not require server modifications as motivation for why I chose a server modification based solution.

The fundamental problem that needs to be thought about is the following -- if a remote document, R, has a hypertext link to original document, O, and a public annotation, P, is subsequently attached to O, how does someone who follows the hypertext link from R to O find the public annotation P?

If no server modifications are permitted, either document R or O is going to have to be modified to find P. I will consider the modification of R and O in turn:

Modify R

One extremely impractical solution is to find all hypertext links to O and update them to somehow reference P. Since the Web does not support the operation of finding all hypertext links to a document, a complete Web scan is required to find all hypertext links. Besides taking too long to be practical, a complete Web scan would not find any links may be embedded in inaccessible documents. Also, many document owners would not want their documents modified to reference P.

A more restricted solution would be to only scan the documents on the local server and modify them. However, given the prevalence of relative URL's, a public annotation might require the recursive modification of a significant number (in the worst case, all) documents on the server. This partial solution will result in some people stumbling into the public annotations on the server and other people missing them entirely.

I will not waste any more time talking about solutions that require either local or global web scanning.

Modify O

A much more practical solution is to modify O so that it contains a single hypertext link from O to P. Thus, the user traverses R to O to P to get to the publicly annotated version O. This is quite easy, but it has the following problems.

Only HTML documents can be annotated. No public annotation of text files is feasible.
It does not work well with HTML documents that are generated by converters.
A synchronization lock is needed to prevent the simultaneous update of O by the author and the installation of a new public annotation.
If a hypertext link jumps into the middle of document, there is no way that the user will know whether there is a publicly annotated version of the document available.

The combination of all of these problems convinced me to reexamine solutions that require server modifications.

So what is the problem with modifying the server? The problem is that there are many HTTP server implementations in use. In order for public annotations systems to become widespread, all of the popular servers would have to be modified to support public annotations and the modification would have to become wide spread. While deploying modified servers is not an insurmountable problem, it does slow down the adoption of my particular solution for public annotations. To ameliorate this problem, I have attempted to make it as easy as possible to adopt my HTTP server modifications.

The server modifications required to support my implementation of public annotations are to:

Copy information from the HTTP configuration file to a file in /tmp so that the CGI scripts will know the fully qualified DNS name of the machine and the path to the root of the document tree.
On each document access, the HTTP server first looks for a publicly annotated version of the document in a mirror directory structure. If the publicly annotated version exists, it is returned to the user; otherwise the non-annoted version is returned.
Provide an escape hatch so that there is some way for people to access the original unannotated document even when an annotated document exists.

These modifications add up to approximately 70 lines of changes to the send_node() routine in the file http_get.c in the NCSA HTTPD server. Given how easy it was to modify the NCSA HTTPD server, I suspect that most other HTTP servers could be modified to support public annotations with a similar level of effort.

One interesting issue with public annotation systems is deciding when a document is available for public annotation. With my solution, all documents accessible from the modified server are available for public annotation. In practice, most people will not know that a document is available for public annotation until the document owner marks it as available. However, people in the know, will be able to publicly annotate a document even if the owner has not marked it as available for public annotation. This is a policy issue that can easily be flipped the other way by minor changes to the public annotation scripts.

Even with my current implementation, a user can easily disable the ability to publicly annotate selected documents and directories by changing the protections in the public annotation mirror directory structure. Thus, if the user wishes to prevent the public annotation of all files in ~/public_html/private, the user can simply change the protection of ~/public_html/Annote/private to be 000. To make life easier, I will probably create a command to do this at some time in the future.

One nice characteristic of my public annotation solution is that it only consumes disk space as documents are publicly annotated. The drawback of my solution is that if most documents on a server are publicly annotated, over twice the disk space is consumed. From my point of view, paying a 2x disk space cost for richly interconnected documents is a small price to pay.

The post_update Program

The post_update program is a CGI program that is installed on the local server to allow remote sites to register a public annotation. The post_update command is the workhorse of the public annotation system in that it does the following:

It takes a URL for a document at a remote site specified via the CGI forms facility and fetches the remote document via HTTP.
It scans the remote document for all hypertext links to documents on the local machine.
For each document on the local machine, it creates or updates a publicly annotated version of the document and a small ancillary file of information about the public annotations.
Via the forms interface, it sends a response back to the user about the success or failure of the public annotation request.

An interesting feature to note about the post_update program is that it only takes the URL of a document at a remote site. The URL of the document to be publicly annotated is kept in the remote document. An alternative is to specify both the local and remote documents to the post_update program. The problem with this alternative, is that owner of the remote document could over time update the remote document so that it was no longer pertinent to the local publicly annotated document. By requiring that the remote document always contain a hypertext link to the local document, the remote document owner is consistently reminded that there some relevance to the local publicly annotated document. The cron_update program is used to periodically ensure that the remote documents both continue to exist and continue to reference the local document.

While CGI forms interface was designed to provide the ability for people to fill out forms locally and send them to a remote site, there is no reason why other programs can not use exactly the same facility. In fact, the post_public program uses the post_update program to accomplish its task. The interesting thing to note here is that the CGI forms protocol is beginning to evolve into an open extensible protocol. Thus, any time someone needs to create two-way communication between two machines on the network, they can just use the CGI forms mechanism to implement that protocol. Currently, any errors encountered by a CGI program are replyed back to the user via human readable error messages. The problem with this is that computers have a very difficult time understanding human readable error messages. Given that the response back from a CGI script has a MIME header, it is easy to encode any errors as simple numbers. For example,

	...
	Content-type: text/html
	Error: 12 -- Document has no title.
	...

is a portion of a MIME header that contains a field named `Error' which is followed by a decimal number and a human readable error message. A computer program can use the number in the `Error' field to determine what when wrong.

The post_public Program

The post_public program is used to simplify the task of authoring public annotations using Mosaic. The post_public program does the following:

It searches the user's ~/.mosaic-personal-annotations directory looking for the personal annotation that has most recently been modified.
It searchs the LOG file in the user's ~/.mosaic-personal-annotations directory to determine the URL of the document associated with the personal annotation.
It copies the personal annotation from ~/.mosaic-personal-annotations to ~/public_html/annotations and installs a symbolic link in its place. A hypertext link to the document to be publicly annotated is inserted into the annotation.
It uses the post_update CGI script to register the existence of a new public annotation.
It uses the Mosaic remote control facility to force Mosaic to display the response back from the post_update CGI script.

The Public Annotation Database

{More goes here.}

The post_solicit Program

{More goes here.}

The cron_update Program

{More goes here.}

The setup_public_annote Program

The setup_public_annote program is used to simplify the installation of my public annotation system. The setup_public_annote program does the following:

It creates a file called /tmp/httpd.config that contains information fetched from the HTTP server configuration file. This information is needed by the post_update program to find the location of files in the local machine.
Writes the man pages and forms in the annote_docs directory. This way every version of my public annotation system always has local up-to-date documentation for how it is used

The post_vote Program

{More goes here.}

This file, version 1.4 of publishing.html, was last updated at 21:24:53 on 95/09/15.