The previous section discusses public annotation authoring in my public annotation system paper.

Public Annotation System Publishing

Making an annotation available to the public is called publishing the annotation.

There are several issues that have to be decided upon prior to implementing a public annotations:

In-line versus Floating
Will the annotations be inserted into specific locations in the original document (i.e. in-line annotations) or will the annotations be inserted at the beginning or end of the original document (i.e. floating annotations.)
Embedded versus Indirect
Will the text of the annotation be displayed along side the original document (i.e. embedded annotations) or will hypertext links to the annotation text be displayed (i.e. indirect annotations.)
Local versus Remote
Where will the publics annotations be store? They can either be stored on the same machine as the original document being annotated (i.e. local annotations) or can be stored on different machine than the original document (i.e. remote annotations.)
(More goes here.)

In-line versus Floating Annotations

While I defined terms in-line and and floating annotation in the introductory section, it can not hurt to reiterate the definitions of these two terms again. I call an annotation that is attached to the original document as a whole a floating annotation; conversely, I call an annotation that is attached to a specific character, word, sentence, paragraph, figure, or section in the original document an in-line annotation. In general, annotation authors want to be able to have both floating and in-line annotations.

So why is there any distinction made between in-line and floating annotations? In general, floating annotations are much easier to implement than in-line annotations. For example, Mosaic currently only supports floating annotations.

So why are in-line annotations harder to implement? The answer to this question depends upon whether the contents of the original document can be changed after an in-line annotation has been attached to it. In fact, if original document can not have its contents modified, in-line annotations are easy to implement, since their location can be recorded as a simple character offset from the beginning of the document. However, if the original document can have its contents modified, the annotation support system must be prepared to deal with having the text to which the annotation is attached be either moved or outright deleted. It is the text movement/deletion problem that causes the implementation of in-line annotations to be more difficult than floating annotations. Since the World Wide Web encourages people to continually update their original documents, any annotation that wishes to support in-line annotations for the Web must have a viable solution to the text movement/deletion problem.

I can think of at least four defensible solutions to the text movement/deletion problem associated with in-line annotation attachments and they are listed below:

Notification Only Solution
The notification only solution is essentially a cop-out. Basically, any time the original document is modified, all in-line annotations are converted to floating annotations and the annotation authors are required to reattach the annotations to the correct location in the original document.

The primary problem with the notification only solution is that it really discourages annotation authors from using in-line annotations at all.

Named Anchor Solution
The anchor solution requires that an HTML named anchor (i.e. <A Name=anchor_name>) be inserted into the original document at the annotation attachment point. Whenever an the original document is updated care is taken to never delete the named anchors.

While the simplicity of the named anchor solution is quite appealing at first, it has the following drawbacks:

Editor Marker Solution
The editor marker solution is actually a variant of the named anchor solution above.

An editor that supports markers is used to modify the original document. A marker is an invisible character that editor can insert into a document. Sophisticated editors such as EMACS and FrameMaker have marker support. With markers, each annotation attachment point is converted to a marker inside the document, as the document is modified, the editor keeps the markers properly positioned relative to the document text, and finally, when the document is written out the marker positions are read to determine the new annotation attachment positions.

The editor marker solution solves the first two problems of the named anchor solution, but still has the lock synchronization problem and HTML converter problem. In addition, it introduces the problem or requiring document authors to only use a restricted set of editors for document modification.

Pattern Match Solution
The pattern match solution associates a pattern of characters with each in-line annotation. Whenever the original document is modified, it is rescanned to locate the new positions for each in-line annotation.

The pattern match solution has the following problems:

While I could probably come up with more alternative solutions and variants, there is not much point in doing so, since I decided to go with the pattern match solution. I spend the remainder of the section discussing some problems with the pattern match solution and my solutions to these problems.

A reasonable strategy for supporting in-line annotations is for the user to specify a pattern to search for in the original document. Since HTML is mostly insensitive to the placement of white space, the pattern matching algorithm should also be insensitive to white space. Since HTML mark up is not displayed by Web browsers, the pattern matching algorithm should also be insensitive to HTML mark up.

What happens when the user specifies a pattern that has more than one match? In this case, the user should also specify the number of the match that they meant out of the total number (i.e. third match of seven total matches.)

What happens when the original document is modified such that a particular pattern no longer matches? In this case, the annotation should be turned into an annotation that floats to the top or the bottom of the document. Whenever this occurs, the owner of the public annotation needs to be notified that their annotation become detached. Notification is discussed in the chapter on notification.

Once a pattern has been specified, where is the annotation placed relative to the pattern?

What about entities?

Embedded versus Indirect Annotations

(This needs to be written.)

Local versus Remote Annotations

(This needs to be written.)

My Implementation

My public annotation publishing system consists of the following components: Each of these components is discussed in greater detail in the sections below.

HTTPD Server Modifications

When I was originally thinking about this project, I was hoping that it would be unnecessary to do any server modifications at all. Unfortunately, as I explored the problem space I discovered that there were some very unattractive characteristics associated with a solution that did not use server modifications. I will discuss solutions that did not require server modifications as motivation for why I chose a server modification based solution.

The fundamental problem that needs to be thought about is the following -- if a remote document, R, has a hypertext link to original document, O, and a public annotation, P, is subsequently attached to O, how does someone who follows the hypertext link from R to O find the public annotation P?

If no server modifications are permitted, either document R or O is going to have to be modified to find P. I will consider the modification of R and O in turn:

Modify R
One extremely impractical solution is to find all hypertext links to O and update them to somehow reference P. Since the Web does not support the operation of finding all hypertext links to a document, a complete Web scan is required to find all hypertext links. Besides taking too long to be practical, a complete Web scan would not find any links may be embedded in inaccessible documents. Also, many document owners would not want their documents modified to reference P.

A more restricted solution would be to only scan the documents on the local server and modify them. However, given the prevalence of relative URL's, a public annotation might require the recursive modification of a significant number (in the worst case, all) documents on the server. This partial solution will result in some people stumbling into the public annotations on the server and other people missing them entirely.

I will not waste any more time talking about solutions that require either local or global web scanning.

Modify O
A much more practical solution is to modify O so that it contains a single hypertext link from O to P. Thus, the user traverses R to O to P to get to the publicly annotated version O. This is quite easy, but it has the following problems. The combination of all of these problems convinced me to reexamine solutions that require server modifications.

So what is the problem with modifying the server? The problem is that there are many HTTP server implementations in use. In order for public annotations systems to become widespread, all of the popular servers would have to be modified to support public annotations and the modification would have to become wide spread. While deploying modified servers is not an insurmountable problem, it does slow down the adoption of my particular solution for public annotations. To ameliorate this problem, I have attempted to make it as easy as possible to adopt my HTTP server modifications.

The server modifications required to support my implementation of public annotations are to:

These modifications add up to approximately 70 lines of changes to the send_node() routine in the file http_get.c in the NCSA HTTPD server. Given how easy it was to modify the NCSA HTTPD server, I suspect that most other HTTP servers could be modified to support public annotations with a similar level of effort.

One interesting issue with public annotation systems is deciding when a document is available for public annotation. With my solution, all documents accessible from the modified server are available for public annotation. In practice, most people will not know that a document is available for public annotation until the document owner marks it as available. However, people in the know, will be able to publicly annotate a document even if the owner has not marked it as available for public annotation. This is a policy issue that can easily be flipped the other way by minor changes to the public annotation scripts.

Even with my current implementation, a user can easily disable the ability to publicly annotate selected documents and directories by changing the protections in the public annotation mirror directory structure. Thus, if the user wishes to prevent the public annotation of all files in ~/public_html/private, the user can simply change the protection of ~/public_html/Annote/private to be 000. To make life easier, I will probably create a command to do this at some time in the future.

One nice characteristic of my public annotation solution is that it only consumes disk space as documents are publicly annotated. The drawback of my solution is that if most documents on a server are publicly annotated, over twice the disk space is consumed. From my point of view, paying a 2x disk space cost for richly interconnected documents is a small price to pay.

The post_update Program

The post_update program is a CGI program that is installed on the local server to allow remote sites to register a public annotation. The post_update command is the workhorse of the public annotation system in that it does the following:
  1. It takes a URL for a document at a remote site specified via the CGI forms facility and fetches the remote document via HTTP.
  2. It scans the remote document for all hypertext links to documents on the local machine.
  3. For each document on the local machine, it creates or updates a publicly annotated version of the document and a small ancillary file of information about the public annotations.
  4. Via the forms interface, it sends a response back to the user about the success or failure of the public annotation request.

An interesting feature to note about the post_update program is that it only takes the URL of a document at a remote site. The URL of the document to be publicly annotated is kept in the remote document. An alternative is to specify both the local and remote documents to the post_update program. The problem with this alternative, is that owner of the remote document could over time update the remote document so that it was no longer pertinent to the local publicly annotated document. By requiring that the remote document always contain a hypertext link to the local document, the remote document owner is consistently reminded that there some relevance to the local publicly annotated document. The cron_update program is used to periodically ensure that the remote documents both continue to exist and continue to reference the local document.

While CGI forms interface was designed to provide the ability for people to fill out forms locally and send them to a remote site, there is no reason why other programs can not use exactly the same facility. In fact, the post_public program uses the post_update program to accomplish its task. The interesting thing to note here is that the CGI forms protocol is beginning to evolve into an open extensible protocol. Thus, any time someone needs to create two-way communication between two machines on the network, they can just use the CGI forms mechanism to implement that protocol. Currently, any errors encountered by a CGI program are replyed back to the user via human readable error messages. The problem with this is that computers have a very difficult time understanding human readable error messages. Given that the response back from a CGI script has a MIME header, it is easy to encode any errors as simple numbers. For example,

	...
	Content-type: text/html
	Error: 12 -- Document has no title.
	...
								
is a portion of a MIME header that contains a field named `Error' which is followed by a decimal number and a human readable error message. A computer program can use the number in the `Error' field to determine what when wrong.

The post_public Program

The post_public program is used to simplify the task of authoring public annotations using Mosaic. The post_public program does the following:
  1. It searches the user's ~/.mosaic-personal-annotations directory looking for the personal annotation that has most recently been modified.
  2. It searchs the LOG file in the user's ~/.mosaic-personal-annotations directory to determine the URL of the document associated with the personal annotation.
  3. It copies the personal annotation from ~/.mosaic-personal-annotations to ~/public_html/annotations and installs a symbolic link in its place. A hypertext link to the document to be publicly annotated is inserted into the annotation.
  4. It uses the post_update CGI script to register the existence of a new public annotation.
  5. It uses the Mosaic remote control facility to force Mosaic to display the response back from the post_update CGI script.

The Public Annotation Database

{More goes here.}

The post_solicit Program

{More goes here.}

The cron_update Program

{More goes here.}

The setup_public_annote Program

The setup_public_annote program is used to simplify the installation of my public annotation system. The setup_public_annote program does the following:

The post_vote Program

{More goes here.}


This file, version 1.4 of publishing.html, was last updated at 21:24:53 on 95/09/15.

Copyright (c) 1994,1995 -- Wayne C. Gramlich. All rights reserved.