The previous section discusses
public annotation authoring in my
public annotation system paper.
Public Annotation System Publishing
Making an annotation available to the public
is called publishing the annotation.
There are several issues that have to be
decided upon prior to implementing a public
annotations:
-
In-line versus Floating
-
Will the annotations be inserted into specific
locations in the original document (i.e.
in-line annotations) or will the annotations
be inserted at the beginning or end of the
original document (i.e. floating annotations.)
-
Embedded versus Indirect
-
Will the text of the annotation be displayed
along side the original document (i.e. embedded
annotations) or will hypertext links to the
annotation text be displayed (i.e. indirect
annotations.)
-
Local versus Remote
-
Where will the publics annotations be store?
They can either be stored on the same machine
as the original document being annotated
(i.e. local annotations) or can be stored
on different machine than the original
document (i.e. remote annotations.)
(More goes here.)
In-line versus Floating Annotations
While I defined terms in-line and and floating annotation
in the
introductory section, it can not hurt to reiterate
the definitions of these two terms again. I call an
annotation that is attached to the original document
as a whole a floating annotation; conversely, I call
an annotation that is attached to a specific character,
word, sentence, paragraph, figure, or section in the
original document an in-line annotation. In general,
annotation authors want to be able to have both floating
and in-line annotations.
So why is there any distinction made between in-line
and floating annotations? In general, floating
annotations are much easier to implement than in-line
annotations. For example, Mosaic currently only
supports floating annotations.
So why are in-line annotations harder to implement?
The answer to this question depends upon whether the
contents of the original document can be changed after
an in-line annotation has been attached to it. In fact,
if original document can not have its contents modified,
in-line annotations are easy to implement, since their
location can be recorded as a simple character offset
from the beginning of the document. However, if the
original document can have its contents modified,
the annotation support system must be prepared to deal
with having the text to which the annotation is attached
be either moved or outright deleted. It is the text
movement/deletion problem that causes the implementation
of in-line annotations to be more difficult than
floating annotations. Since the World Wide Web encourages
people to continually update their original documents,
any annotation that wishes to support in-line annotations
for the Web must have a viable solution to the text
movement/deletion problem.
I can think of at least four defensible solutions to the
text movement/deletion problem associated with in-line
annotation attachments and they are listed below:
-
Notification Only Solution
-
The notification only solution is essentially
a cop-out. Basically, any time the
original document is modified, all in-line
annotations are converted to floating
annotations and the annotation authors
are required to reattach the annotations
to the correct location in the original
document.
The primary problem with the notification
only solution is that it really discourages
annotation authors from using in-line
annotations at all.
-
Named Anchor Solution
-
The anchor solution requires that an HTML
named anchor (i.e.
<A Name=anchor_name>) be
inserted into the original document at the
annotation attachment point. Whenever an
the original document is updated care is
taken to never delete the named anchors.
While the simplicity of the named anchor
solution is quite appealing at first, it
has the following drawbacks:
-
The annotation system needs the ability
to modify the original document.
-
Annotations can only be made to HTML
documents. Annotation of plain text
files is not feasible.
-
Annotations can not be made while the
original document is being modified.
This means that there needs to be some
sort of synchronization lock that prevents
any in-line annotations from being made
while the original document is being
modified.
-
Many HTML documents are generated via
converter programs that would all have
to be modified to preserve the named
anchors.
-
Editor Marker Solution
-
The editor marker solution is actually a
variant of the named anchor solution above.
An editor that supports markers is used to
modify the original document. A marker
is an invisible character that editor can
insert into a document. Sophisticated editors
such as EMACS and FrameMaker have marker
support. With markers, each annotation
attachment point is converted to a marker
inside the document, as the document is
modified, the editor keeps the markers
properly positioned relative to the document
text, and finally, when the document is
written out the marker positions are read
to determine the new annotation attachment
positions.
The editor marker solution solves the first
two problems of the named anchor solution,
but still has the lock synchronization problem
and HTML converter problem. In addition,
it introduces the problem or requiring
document authors to only use a restricted
set of editors for document modification.
-
Pattern Match Solution
-
The pattern match solution associates a
pattern of characters with each in-line
annotation. Whenever the original document
is modified, it is rescanned to locate the
new positions for each in-line annotation.
The pattern match solution has the following
problems:
-
The person modifying the original
document, can unintentionally cause
an in-line annotation to become
attached to the wrong position by
inserting the same pattern into the
original document more than once.
-
The annotation author is required to specify
a pattern with each in-line annotation.
While I could probably come up with more alternative
solutions and variants, there is not much point in
doing so, since I decided to go with the pattern match
solution. I spend the remainder of the section
discussing some problems with the pattern match solution
and my solutions to these problems.
A reasonable strategy for supporting in-line annotations
is for the user to specify a pattern to search for
in the original document. Since HTML is mostly insensitive
to the placement of white space, the pattern matching
algorithm should also be insensitive to white space.
Since HTML mark up is not displayed by Web browsers,
the pattern matching algorithm should also be insensitive
to HTML mark up.
What happens when the user specifies a pattern that
has more than one match? In this case, the user should
also specify the number of the match that they meant
out of the total number (i.e. third match of seven total
matches.)
What happens when the original document is modified
such that a particular pattern no longer matches?
In this case, the annotation should be turned into
an annotation that floats to the top or the bottom
of the document. Whenever this occurs, the owner
of the public annotation needs to be notified that
their annotation become detached. Notification is
discussed in the
chapter on notification.
Once a pattern has been specified, where is the
annotation placed relative to the pattern?
What about entities?
Embedded versus Indirect Annotations
(This needs to be written.)
Local versus Remote Annotations
(This needs to be written.)
My Implementation
My public annotation publishing system consists
of the following components:
-
A modified NCSA HTML server that will serve
up an annotated version of a document in
preference to the original document, if an
annotated document exists.
-
A CGI program that a remote site can use
to cause a public annotation to be added to
a document.
-
A small database per annotated document that
keeps track of all public annotations and
annotation voting information.
-
A CGI program that allows a remote user to
vote on the relevance of a given public
annotation.
-
A program that is run occasionally from a cron
job to verify the continued existence of
public annotations. (Not implemented yet!)
-
A program that lets allows someone to use
Mosaic private annotations for public annotation
authoring.
-
A CGI program that allows a user to `register'
a document as being available for public
annotation.
Each of these components is discussed in greater
detail in the sections below.
HTTPD Server Modifications
When I was originally thinking about this project,
I was hoping that it would be unnecessary to do
any server modifications at all. Unfortunately,
as I explored the problem space I discovered that
there were some very unattractive characteristics
associated with a solution that did not use server
modifications. I will discuss solutions that did
not require server modifications as motivation for
why I chose a server modification based solution.
The fundamental problem that needs to be thought
about is the following -- if a remote document,
R, has a hypertext link to original document, O,
and a public annotation, P, is subsequently
attached to O, how does someone who follows
the hypertext link from R to O find the public
annotation P?
If no server modifications are permitted, either
document R or O is going to have to be modified
to find P. I will consider the modification of
R and O in turn:
Modify R
One extremely impractical solution is to
find all hypertext links to O and update
them to somehow reference P. Since the
Web does not support the operation of
finding all hypertext links to a document,
a complete Web scan is required to find
all hypertext links. Besides taking too
long to be practical, a complete Web scan
would not find any links may be embedded
in inaccessible documents. Also, many
document owners would not want their
documents modified to reference P.
A more restricted solution would be to only
scan the documents on the local server and
modify them. However, given the prevalence
of relative URL's, a public annotation might
require the recursive modification of a
significant number (in the worst case, all)
documents on the server. This partial
solution will result in some people stumbling
into the public annotations on the server
and other people missing them entirely.
I will not waste any more time talking about
solutions that require either local or
global web scanning.
Modify O
A much more practical solution is to modify
O so that it contains a single hypertext link
from O to P. Thus, the user traverses R to O
to P to get to the publicly annotated version
O. This is quite easy, but it has the following
problems.
-
Only HTML documents can be annotated. No
public annotation of text files is feasible.
-
It does not work well with HTML documents
that are generated by converters.
-
A synchronization lock is needed to prevent
the simultaneous update of O by the author
and the installation of a new public annotation.
-
If a hypertext link jumps into the middle
of document, there is no way that the user
will know whether there is a publicly
annotated version of the document available.
The combination of all of these problems convinced
me to reexamine solutions that require server
modifications.
So what is the problem with modifying the server?
The problem is that there are many HTTP server
implementations in use. In order for public
annotations systems to become widespread, all
of the popular servers would have to be modified
to support public annotations and the modification
would have to become wide spread. While deploying
modified servers is not an insurmountable problem,
it does slow down the adoption of my particular
solution for public annotations. To ameliorate
this problem, I have attempted to make it as easy
as possible to adopt my HTTP server modifications.
The server modifications required to support my
implementation of public annotations are to:
-
Copy information from the HTTP configuration file
to a file in /tmp so that the CGI scripts will
know the fully qualified DNS name of the machine
and the path to the root of the document tree.
-
On each document access, the HTTP server first
looks for a publicly annotated version of the
document in a mirror directory structure.
If the publicly annotated version exists,
it is returned to the user; otherwise the
non-annoted version is returned.
-
Provide an escape hatch so that there is some way
for people to access the original unannotated
document even when an annotated document exists.
These modifications add up to approximately 70 lines
of changes to the send_node() routine in the file
http_get.c
in the NCSA HTTPD server. Given how easy it was to
modify the NCSA HTTPD server, I suspect that most other
HTTP servers could be modified to support public
annotations with a similar level of effort.
One interesting issue with public annotation systems
is deciding when a document is available for public
annotation. With my solution, all documents accessible
from the modified server are available for public
annotation. In practice, most people will not know
that a document is available for public annotation
until the document owner marks it as available.
However, people in the know, will be able to publicly
annotate a document even if the owner has not marked
it as available for public annotation. This is a
policy issue that can easily be flipped the other
way by minor changes to the public annotation scripts.
Even with my current implementation, a user can easily
disable the ability to publicly annotate selected
documents and directories by changing the protections
in the public annotation mirror directory structure.
Thus, if the user wishes to prevent the public annotation
of all files in ~/public_html/private, the user can
simply change the protection of ~/public_html/Annote/private
to be 000. To make life easier, I will probably create
a command to do this at some time in the future.
One nice characteristic of my public annotation
solution is that it only consumes disk space as
documents are publicly annotated. The drawback
of my solution is that if most documents on a
server are publicly annotated, over twice the disk
space is consumed. From my point of view, paying
a 2x disk space cost for richly interconnected
documents is a small price to pay.
The post_update Program
The post_update program is a CGI program that is
installed on the local server to allow remote
sites to register a public annotation. The
post_update command is the workhorse of the
public annotation system in that it does the
following:
-
It takes a URL for a document at a remote
site specified via the CGI forms facility
and fetches the remote document via HTTP.
-
It scans the remote document for all hypertext
links to documents on the local machine.
-
For each document on the local machine,
it creates or updates a publicly annotated
version of the document and a small
ancillary file of information about the
public annotations.
-
Via the forms interface, it sends a response
back to the user about the success or failure
of the public annotation request.
An interesting feature to note about the post_update
program is that it only takes the URL of a document
at a remote site. The URL of the document to be
publicly annotated is kept in the remote document.
An alternative is to specify both the local and
remote documents to the post_update program. The
problem with this alternative, is that owner of the
remote document could over time update the remote
document so that it was no longer pertinent to
the local publicly annotated document. By requiring
that the remote document always contain a hypertext
link to the local document, the remote document owner
is consistently reminded that there some relevance
to the local publicly annotated document. The
cron_update program is used to periodically
ensure that the remote documents both continue to
exist and continue to reference the local document.
While CGI forms interface was designed to provide
the ability for people to fill out forms locally and
send them to a remote site, there is no reason why
other programs can not use exactly the same facility.
In fact, the post_public program uses the post_update
program to accomplish its task. The interesting thing
to note here is that the CGI forms protocol is beginning
to evolve into an open extensible protocol. Thus,
any time someone needs to create two-way communication
between two machines on the network, they can just
use the CGI forms mechanism to implement that protocol.
Currently, any errors encountered by a CGI program are
replyed back to the user via human readable error messages.
The problem with this is that computers have a very
difficult time understanding human readable error messages.
Given that the response back from a CGI script has a
MIME header, it is easy to encode any errors as simple
numbers. For example,
...
Content-type: text/html
Error: 12 -- Document has no title.
...
is a portion of a MIME header that contains a field
named `Error' which is followed by a decimal number
and a human readable error message. A computer program
can use the number in the `Error' field to determine
what when wrong.
The post_public Program
The post_public program is used to simplify the task
of authoring public annotations using Mosaic. The
post_public program does the following:
-
It searches the user's
~/.mosaic-personal-annotations directory
looking for the personal annotation
that has most recently been modified.
-
It searchs the LOG file in the user's
~/.mosaic-personal-annotations directory to
determine the URL of the document associated
with the personal annotation.
-
It copies the personal annotation from
~/.mosaic-personal-annotations to
~/public_html/annotations and installs
a symbolic link in its place. A hypertext
link to the document to be publicly annotated
is inserted into the annotation.
-
It uses the post_update CGI script to register
the existence of a new public annotation.
-
It uses the Mosaic remote control facility to
force Mosaic to display the response back from
the post_update CGI script.
The Public Annotation Database
{More goes here.}
The post_solicit Program
{More goes here.}
The cron_update Program
{More goes here.}
The setup_public_annote Program
The setup_public_annote program is used to simplify
the installation of my public annotation system.
The setup_public_annote program does the following:
-
It creates a file called /tmp/httpd.config that
contains information fetched from the HTTP
server configuration file. This information
is needed by the post_update program to find
the location of files in the local machine.
-
Writes the man pages and forms in the annote_docs
directory. This way every version of my public
annotation system always has local up-to-date
documentation for how it is used
The post_vote Program
{More goes here.}
This file, version 1.4 of publishing.html, was last updated at
21:24:53 on 95/09/15.
Copyright (c) 1994,1995 --
Wayne C. Gramlich. All rights reserved.