This is the design documentation for the SVMS (Source Version Management System).

SVMS Design Notes

This document contains the design notes for SVMS. These design notes are broken into the following sections:

Introduction
Glossary
Overall Design
Directory Layout
History File Format
Make Integration
Summary

The design goals and rational for SVMS are kept in separate documents.

Introduction

Before digging into the design issues of SVMS, it is worth discussing "Why write any design notes in the first place?" There are lots of reasons for not writing any design notes:

It takes time that could otherwise be spent writing code.

The primary reason why I write design notes is to clarify the issues prior to writing the code. It is really painful to write some code, discover some missing issues, and have to rewrite it to cope with the new issues. In the end, doing an up front design is almost always faster.

The design notes should be expressed as comments in the code so that they are maintained over time and do not become out-of-date.

My answers to that statement are:

There is no obvious place in the code for most design notes. The whole point of design notes is that they cover issues that are pervasive through-out the entire program.
Even if there were obvious places in the code to stick chunks of design notes, it is painful to have to read a whole bunch of source files to get the design overview for a program.
Anybody who is professional enough to update comments in code when they become out-of-date, is also professional enough to update the design notes when they become out-of-date.

After the code is finished, design notes are no longer needed.

Most useful code is never `finished'. I frequently find myself coming back to code after a hiatus measured in years to add some new features or to fix a bug that is gone undetected for a long time. After a long hiatus, I find reading the design notes to be extremely useful.

This is a personal programming project, nobody other than Wayne is going to read them.

These design notes may never be read by anybody other than Wayne Gramlich, but who knows, on the World Wide Web, with Web spiders, you never know who is going to read about your projects.

Enough on why I write design notes.

Before diving into design issues, I want make my opinions about SCCS, RCS, CVS, SunPro/SunSoft Code Manager perfectly clear. I think all five of these programs are quite good. Indeed, I find the functionality provided by Code Manager to be sufficiently compelling to my day-to-day coding activities, that I feel compelled to duplicate those capabilities for Linux. In the design below, there are comments about design flaws and/or bugs in the above mentioned programs; this is only to be expected of programs whose inception started in the late 1970's and early 1980's. Since I feel compelled to replace some of the functionality of the above mentioned programs, I think it is appropriate for me to attempt to avoid some of the mistakes of the past and try to improve things as I go along. So please interpret any critical comments these programs as constructive comments rather than non-constructive ones.

The next section of this document provides a glossary of SVMS terms and their corresponding definitions.

Glossary

There is enough terminology and jargon associated with SVMS that it is worth providing the a glossary of terms and there definitions. These terms and definitions are listed alphabetically below:

binary file: A binary file is a file that is treated as a sequence of raw binary bytes. Binary files have the same representation on all file systems.
bringover: A bringover is the operation of transferring the source file changes from a parent project area to a child project area. Bringovers are usually initiated by the developer that `owns' the child project area. Any newly deleted files in the parent project area are also deleted in the child project area. Any files that have changed in both the parent and child project areas are said to be conflicting files and need to be resolved.
child project area: A child project area is a project area that is a child relationship to a parent project area. The parent/child relationship between two project areas is only important for the bringover and putback operations. A reparent operation can be used to change the parent of a child project area to a different parent project area.
common ancestor: The common ancestor for a conflicting file is the most recent version which is in common between a parent project area and a child project area.
conflicting file: A conflicting file is a source file that has changed in both a parent project area and a child project area. Another interesting form of conflicting file is one a file has been deleted in one project area and modified in another. The resolve operation is used fix-up conflicting files.
deleted file: A deleted file is a source file that has been explicitly removed from a project area. Deleted files can be propagated between project areas.
derived file: A derived file is a file that is constructed by either translating (e.g. compiling) another source file or composing (e.g. linking) one or more other derived files. The important thing about derived files is that they they can always be recreated from the source files.
developer: A developer is one person who is working on a project. In general, only one person is permitted to make modifications to a given project area at a time. A single developer may `own' one or more project areas that are all part of the same project.
directory tree: A directory tree consists of a directory, the files contained therein, all recursively sub-directories and their contained files. Project areas are directory trees.
file system: A file system is a place where the files that make up a project area are stored. Different file systems have different ways of representing text files.
lazy bringover: A lazy bringover is a bringover operation that does not copy all version files from the parent project area to the child project area; instead, version files are only copied from the parent project area to the child project area when the corresponding source file is modified or deleted. Lazy bringovers are efficient in terms of both time and disk space. Recursive lazy bringovers are permitted in the sense that a lazy bringover is permitted on a project area that was in turn lazily brought over from yet another project area. Doing a reparent operation on a project area that is a result of a lazy bringover, can result in many version files being brought over to the child project area.
locked project area: A locked project area is a project area that has been locked. Whenever either a bringover or a putback operation is in progress, the corresponding parent project area and child project area is locked.
parent project area: A parent project area is a project area that has a parent relationship to a child project area. The parent/child relationship between two project areas is only important for the bringover and putback operations. different parent project area.
project: A project is a group of source files and directories that one or more developers work on in parallel development fashion.
project area: A project area is a directory tree that contains one development version of a project. Project areas are typically owned by a single developer. Project areas are typically arranged in a tree-like fashion where each project area one parent project area and zero, one, or more child project areas. The root project area is the project area at the root of the project tree and it has no parent project area. Each project area has a project nickname that is assigned at the time the project area is created.
project nickname: A project nickname is a name that is given to a project area when it is created. Care needs to be taken to ensure that no two project areas for the same project have the same project nickname.
putback: A putback is the operation of taking source file changes from a child project area back to the corresponding parent project area. A putback will only succeed if there are no conflicting files. If there are conflicting files, the entire putback will fail, and it will be necessary to do a both bringover and resolve operation before retrying the putback operation.
RCS: RCS stands for Resource Control System. RCS is a freely available version control system that runs on many operating systems (including Linux). RCS has roughly the same capabilities as SCCS.
reparent: A reparent is the operation of causing a child project area have a different parent project area.
resolve: A resolve is the operation of merging the changes from a child project area and its corresponding parent project area for all conflicting files.
SCCS: SCCS stands for Source Code Control System. SCCS is shipped with many different versions of the Unix operating system (except Linux). SCCS has roughly the same capabilities as RCS.
source file: A source file is a file that is directly manipulated by a developer using some sort of editor. Source files are kept under version control.
SVMS: SVMS stands for Source Version Management System.
symbolic link: A symbolic link is an indirect pointer to another file. Not all file systems support symbolic links. Unlike SCCS and RCS, SVMS version files are able to directly store symbolic links.
text file: A text file is a file that consists of zero, one or more lines of text separated by an file system specific new-line sequence. There is no limit on the length of lines in a text file. Also, text files do not require that the last line be terminated by a new-line.
timestamp: A timestamp is a time measured in GMT (Greenwich Mean Time) or UTC (Coordinated Universal Time). Timestamps are measured with a resolution down to the second. SVMS makes no attempt to keep track of time zones.
version: A version is one instance of a source file. Multiple versions of a source file are kept in a version_file.
version file: A version file is a file that records all of the various versions that a file has had. A version file usually has a prefix of v..

Other words and phrases to add to the glossary are:

full bringover
partial bringover
project version
remote bringover
remote file system
remote putback
reparent
root project area

The next section (finally) delves into the overall design issues for SVMS.

Some Overall Design Issues

An SVMS project area, or project for short, is directory tree that contains a directory named SVMS, where a directory tree is defined as all of the files within a given directory and its sub-directories. When a project area is created, it is given a reasonably short project nickname (e.g. gramlich-fix and gramlich-new-feature.) If two project areas are given the same project nickname, it is not possible to readily move files between them; instead, it will be necessary to give one of the projects a new project nickname. Giving a project a new project nickname is a very expensive operation that requires rewriting almost every history file within a project directory tree. Thus, it behooves people to choose project nicknames that are unlikely to ever collide.

Probably the most important design point for SVMS is that an older version of a project area can be recreated in its entirety just by knowing the project nickname and the older version timestamp. At any given timestamp, the same file in different project areas can have differing contents; indeed, it will be quite common for a new file to exist in only one project area when it is first created before it propagates to other project areas. An important side-effect of the timestamp design point is that history files are never moved, renamed, or deleted; this is quite different from SunSoft's Code Manager product which requires that SCCS s. files to be renamed on a regular basis. The SVMS history file format is able to keep track of multiple versions of a file on a per project and per timestamp basis.

Given the importance of timestamps in recreating older versions of project areas, it is important that a consistent version of time be kept for each project. SVMS is designed to work with NFS. Since the system clock on a user's local machine can be quite different from the system clock on the remote NFS server, SVMS will always use timestamps derived from files (i.e. the NFS server system clock.) Huge projects can be spread across multiple file systems served by different NFS servers, but SVMS will always attempt to detect if any significant clock skew has occurred between the systems; if so, SVMS will mandate that the various system clocks be brought back into synchronization.

When SVMS transfers files between project areas, care is taken to ensure that the current relative timestamp order is preserved. This can be critical for not breaking programs that depend upon relative file ages (like make.)

The timestamp design point permits lazy bringovers, since all that must be recorded for a lazy bringover is the parent project root directory and the bringover timestamp. For a project that has been brought over in a lazy fashion, history files are copied into the project only as they are modified. For large projects, lazy bringovers save a great deal of disk space and time. Recursive lazy bringovers are permitted.

In many development shops, desk top machine backup is not provided. This causes many people to develop their code on remote file systems via NFS that have slower access performance, but are backed up on a regular basis. SVMS provides a means of allowing all development to proceed on a significantly faster local disk. On a regular basis (e.g. logout, nightly, etc.), the user can cause the contents of all open files and/or changed history files to be backed up to another machine. This can save quite a bit of day-to-day programmer time and network bandwidth.

SVMS permits files to be in one of five states:

Text: Lines of text
Binary: Arbitrary binary bytes
Directory: A sub-directory
Symbolic Link: A UNIX symbolic link
Deleted: The file is conceptually deleted
Derived: The file is produced as the output of some program. (This is not implemented yet!}

When a file is in the deleted state, there is no version of the file in the regular project directories; however, the history file continues to exist. Note that UNIX hard links, named pipes, and special devices are not directly supported by SVMS.

The basic type of a file name can evolve as a project evolves. For example, early in a project a given file name might refer to a single binary file and later in the same project, the file name might refer to a directory. SVMS history file format supports the ability for file names to change their basic type over time.

The SVMS history file format directly supports the concept of a deleted file in order to avoid a serious design flaw associated with SunPro/SunSoft Code Manager. The problem with SunPro/SunSoft Code Manager is that there is no reliable way delete files from one workspace and to reliably propagate that deletion to other workspaces. Whenever someone attempts to delete an SCCS history file (i.e. an s. file) in their workspace, very frequently, the next bringover/putback operation with another workspace that still has the SCCS history file will cause the file to reappear. Thus, experienced users of SunPro/SunSoft Code Manager no longer attempt to delete SCCS history files; instead, they rename the history files to a directory that exclusively contains deleted files (a.k.a DELETED.) It is really unfortunate that SunPro/SunSoft Code Manager forces people to rename SCCS history files, since that precludes providing automatic support for restoring previous versions of the workspace; instead, users have to anticipate that they will want a particular prior version of the workspace, record a snapshot file, and keep track of the snapshot file. All the hairy code required to support SCCS history file renaming in SunPro/SunSoft Code Manager is avoided in SVMS by directly supporting deleted files in SVMS history files.

In addition to deleted files, SVMS supports the concept of directories in history files in order to avoid another less serious design defect in SunPro/SunSoft Code Manager concerning the deletion of directories. In Code Manager, the only way to delete a directory from a workspace is to rename all of the SCCS history files to another location and propagate the renames to other workspaces; while this does not directly cause the directory to be deleted in the other workspaces, the next time a workspace is built from scratch, the directory will no longer be present. In SVMS, directories are both explicitly added to a project and explicitly deleted from a project area. When one project area is merged with another project area, the appropriate directory creations and deletions occur automatically. For safety purposes, SVMS will not delete a directory unless its contents are empty. Thus, a bringover/putback of a deleted directory will very frequently generate an SVMS error mandating that the user manually clear the directory to be deleted and retry the operation.

One of the goals of SVMS is to have bringovers and putbacks take time proportional to the differences between the two workspaces rather than requiring a complete workspace comparison. The way this goal is met is by computing a signature for each history file. All of the signatures for all of the source files are assembled into a directory signature file. The directory signature files are rolled up to the project root. By comparing signatures, it is possible to rapidly identify the differences between any two project workspace.

For text and binary files, the executable bits are preserved in the history files. This allows the direct storage of binary executable and executable shell scripts in history files without having to do any mucking around inside of a Makefile.

{Actually, at the moment, there only seem to be two kinds of symbolic links -- Good and other.} Symbolic links merit further discussion. There are three kinds of symbolic links:

Good: A relative pointer to a text/binary file or another symbolic link that is contained within the project.
OK: A relative pointer to a directory or non-existent file that is contained within the project.
Bad: A pointer to any file/directory/symbolic link that is outside the project.

All forms of symbolic links can be stored in a history file, but the OK and Bad forms require a little extra effort to override error messages.

SVMS is a little different from SCCS and RCS when it comes to editing files. Both RCS and SCCS have the concept of a file being checked-in or checked-out. SVMS is a little different in that files are either open (read-write) or closed (read-only). When a file is closed, care is taken to just change the mode bits (access control bits) for the file from read-write to read-only. Since the file contents are not modified when it is closed, make will not perform a whole bunch of unnecessary rebuilding when a file closed. From a design point of view, what this means is that any new identification strings are inserted into a file when it is opened; this means that when a file is subsequently closed, none of the file contents need to be changed. A further implication of this design decision is that when a SVMS history file is closed, its modification time is set back to one second prior to the last modification time of the source file; this means that make will not do a gratuitous update of the source file because the history file timestamp is more recent than source file timestamp. {By the way, I am nervous about about resetting the history timestamp; this is generally considered to be an impolite thing to do.}

SVMS supports identification strings. The goal is to support free format strings the way that SCCS supports them. Given the desire of not modifying the file contents when it is closed, SVMS has to go through a bunch of contortions to keep track of what is really going on. Here are the identification strings that can be inserted.

Authors: The list of authors that have contributed to the document. The intention is that this can be stuck into a standard copyright notice.
Years: The years that the files were modified (e.g. 1993-4, 1996). The intention is that this can be stuck into a standard copyright notice.
Date: The date in "YYYY/MM/DD" format. If the file is opened and closed on the same day, this identifier will not be rewritten. Conversely, if the file is opened on one day and closed on a subsequent day, the file will be rewritten.
Filename: The base filename for the file
Time: The time in "hh:mm:ss" format. This identifier will always cause the file to be modified during a close.
Timestamp: The timestamp in "YYYY/MM/DD@hh:mm:ssGMT" format. This identifier will always cause the file to be modified during a close. Version; The latest version number.

In order to fully support parallel development, the SVMS history files need to be able to represent development branches. This is accomplished by naming historys with a unique project identifier and a history number that is unique for the project. This design decision allows for the easy merging of historys during a project merge operation.

It is an SVMS goal to support the development of software distributed around the Internet. In order to support this goal, the access to a parent project can be performed via the HTTP protocol. While protocols other than HTTP could be considered (e.g. TCP/NFS), HTTP is far more likely to work through most corporate firewalls than some of the other probably superior protocols (like TCP/NFS). The SVMS system is designed to be able to perform a bringover exclusively via HTTP GET operations. This will be most efficient if HTTP 1.1 is implemented at the remote site and connection multiplexing (i.e. reusing the same TCP connection for multiple GET's) implemented all the way from end-to-end. There will be an explicit way to specify a corporate proxy server for HTTP access. There will be two forms of HTTP putback supported -- a pullback executed at the parent site running the HTTP server or a putback via a HTTP POST operation initiated at the child site.

The way an HTTP post is performed to do a putback is to put all of the changes into a tar file, unpack the tar file at the server, and run a local putback command. Any error messages are echoed back as part of the POST output. This is very similar to the media putback described below.

It is a goal of SVMS to allow history files to be either encrypted and/or compressed. Both encryption and compression technology have a number of legal issues. For this reason, SMVS does not implement any specific algorithm to support either encryption or compression; instead, a more general mechanism is provided that allows someone to plug in a generic program in. Thus, if you have an appropriate license to the latest LZW compression algorithm, you are free to use it; alternatively, you are welcome to use the gzip/gunzip algorithms that appear to be legally unencumbered. Similarly, if you can find an encryption algorithm that can be used without violating export controls (in the USA, it is fairly common to get an export license for 40-bit RC4), you can plug that in as well. If you can not find a legal encryption algorithm whose security you are comfortable with, you will have to live without.

One of the SVMS goals is to allow secure bringovers and putbacks via the unsecured public internet. This is accomplished by simply encrypting the SVMS directory tree using some legal encryption algorithm and then making the encrypted SVMS directory tree accessible via an HTTP server. An HTTP server is just as happy to deliver encrypted bits as unencrypted bits. The files are encrypted using a password that is exchanged via some other channel (e.g. snail-mail, E-mail, FAX, a telephone call, a face-to-face meeting, public key cryptography, etc.) In order to keep the password slightly secure, SVMS will prompt for a password and then pass the password to the encryption/decryption algorithm via standard input; the alternative of passing the password in via a command line option, is not as secure, since the ps command snoops and displays command line options. Most HTTP servers will return a directory listing if there is no index.html file present in a directory. In order to prevent people from deducing information about a project from the directory structure, an encrypted project will have an empty index.html file present in each d.directory.

One of the goals is to allow bringovers and putbacks via physical media transfer such as a floppy disk or magnetic tape (i.e. sneaker net.) The sub-goals of media transfer mode are:

The amount of storage required to transfer the project must be roughly proportional to the amount of changes between the child and parent projects. The storage required must not be proportional to the size of the entire project.
The entire task must be accomplished with a a total of two trips -- one from the child site to the parent site and one from the parent site to the child site.

Media mode is accomplished as follows:

The parent site creates a child project for the purpose of interacting with the remote child.
The child site creates a parent project for the purpose of interacting with the remote parent.

For a media bringover, the following steps occur:

We start at the child site and create a single file that lists every file in the child project along with the timestamps, hash values and whatever else is needed. We'll call this the overview file.
The overview file is written to the media, physically transfered over to the parent site, and read in at the parent site.
At parent site in the child project, a bringover is performed using the overview file. With the overview file, the files that changed are identified and transfered into the child project. A tar file is created of the child project, it is written out to the media, physically transfered to the child site, and the media is read in.
The child site unpacks the tar file in the parent project, goes to the child project and performs a bringover. All of the needed files are present, so the bringover succeeds.

For a media putback, the following steps occur:

We start at parent site, create an overfile, and write it to media.
We do a putback at the child project in the child site with the overview file. The media putback puts back just the files that are needed.
The parent project at the child site is tar'ed off to the media and transfered to the parent site.
The media is untarred in the child project and a putback command is executed. As long as the parent overfiles have not changed, the child putback will succeed.

The bringover and putback commands are basically the same command with a few minor differences in behavior:

bringover: A bringover merges the parent workspace into the child workspace. If there are any conflicts between files in the parent and files in the child, the conflicts need to be resolved using the resolve command. In general, a bringover command can never fail. A bringover can result in only a subset of the files that changed in the parent from coming over.
putback: A putback merges the child workspace back up into the parent workspace. This command will only succeed if all of the changes in the child are `compatible' with changes in the parent. In general, this command will fail if there are any changes in the parent workspace that have not been brought down to the child workspace. A putback is `atomic' in the sense that it will either update all of the files in the parent or none of them; there is no concept of a partial putback (except for slices.)

The way that a putback is normally performed is that the use does a bringover first, compiles and tests everything, and then does a putback. If two people are trying to do a putback at the same time, one of the people is going to get their changes in before the other and the other person will have to do an additional bringover before doing their putback.

For a bringover, the source project is the parent and the target project is the child. Conversely, for a putback, the source project is the child and the target project project is the parent. The way a workspace merge works is as follows:

The workspaces are compared for differences. Usually the slices :/... are used, but the user can specify a smaller set of slices. This is accomplished by recursively comparing p.listing files and looking for files that have different hash values.
Any missing directories in the target project are created next (both in the SVMS directory tree and in the regular directory tree.)
If there are any open files in the parent or the child, the conflict is kept pending until the user closes the file. An open file in either the parent or the child during a putback will cause the entire putback to fail.
Files are compared to see what changed. If the entire contents of the source history file are contained in the target history file, the source history file is copied over to the target history file. If the file type changed, the appropriate action is performed (i.e. creating a sub-directory, deleting the file, etc.) For a bringover, any appropriate b. file is created to record when the file showed up in the child project. For a putback, a delta is entered into the updated parent history file to record when it arrived in the parent; this delta will have both a regular and a merge ancestor. All putback deltas in all of the updated history files are given exactly the same timestamp so that conceptually show up at exactly the same time in the parent project.
If the source history file is not a sub-set of the target history file, the two files are said to be `in conflict.' For a putback, the first conflict discovered will cause the entire operation to terminate unsuccessfully. For a bringover, the conflicting file is entered into a list for subsequent processing. When the resolve list is emptied out, the bringover is complete; until then the bringover is `partial'.

A conflict is resolved as follows:

If the file types have changed the user is going to have to select one history file over the other. The merged history file will have an orphaned branch within it.
If both files are binary files, the user will again have to pick one over the other. Again, this will result in an orphaned branch in the document.
If both files a text files, the differences between the files a determined. A difference can be automatically merged into the result file whenever it is a simple insertion or deletion. Whenever there are one or more lines that changed in both the source and the target at the same location, the user will have to manually resolve the differences.
After a conflict has been resolved, there will be a new result version and corresponding reslut delta in the history file. The result delta will reference both the target and source deltas as ancestors. It should go without saying, that any b. file associated with conflict is absorbed as part of the resolution process.

Each conflicting history file from the parent is stored in the SVMS directory tree in as a c. file stored in the same directory with the same base name. For example, if main.c has a conflict, in the same directory as h.main.c the conflicting history file from the parent would be stored in c.main.c.

The next important area to discuss is the organization of the SVMS directory.

`SVMS` Directory Design

For SVMS, a project is contained in a single directory tree whose root is called the project root. The project root is identified to the rest of the SVMS system by installing a directory called SVMS in the project root directory.

The SVMS directory contains a mirror image of the all the files and directories in a project. The various files in a SVMS directory are tagged with prefix characters. Prefixes are used so that there is no danger of accidental collision in the future. The prefix tags are:

b.file_name: b.file_name is a bringover file in tagged file format. The b. file format is discussed in greater detail in the bringover file format section below.
c.conflict_file_name: c.conflict_file_name is a conflict file. It is simpley a history file from the parent that has not been resolved.
d.directory_name: d.directory_name is a sub-directory. For every d. sub-directory, there is a corresponding h. history file.
h.file_name: h.file_name is a history file in tagged file format. The h. file format is is discsussed in greate detail in the history file format section below.
l.file_name: l.file_name is a lock file in tagged file format. The l. file format is discussed in greate detail in the lock file format section below.
p.listing: A p.listing file is an alphabetical listing of all h. files in a directory. In addition, if there are any d. sub-directories or b. bringover files, that information is recorded in the p.listing file as well. The p.listing file format is discussed in greater detail in the p.listing file format section below.

The top level SVMS directory contains some additional files that are preceded by a p. prefix:

p.notify: If present, this file contains a list of E-mail addresses. The people on this list will be notified any time a bringover or putback operation succeeds on the project. The format of this file is TBD.
p.project: This file contains information about the project in tagged file format. The p.project file format is discussed in greater detail in the p.project file format section below.
p.releases: This file contains a list of named releases and their corresponding timestamps. The format of this file is TBD.

Many of the files are in a format called tagged file format. Tagged file format is discussed next.

Tagged File Format

A tagged file is a human readable ASCII file that basically consists of a bunch of tagged lines. Each line starts with a tag character, zero, one or more values, followed by the appropriate new-line sequence for the file system.

The data types that can occur on a tagged line are:

Strings: A string is enclused in double quotes and consists of values characters between space and tilde, inclusively. All characters not within the range are escaped using %XX, where X is a hexadecimal digit. For obvous reasons, the percent character, and double quote character is are escaped as well.
Unsigned Numbers: > An unsigned number consists of a number followed by a decimal point. The purpose of a decimal point is to to make it possible to implement a reader without having to go through the hassle of doing look-ahead.
Logicals: A logical value is reprented as either the letter T (for True) or F (for False.)
Timestamps: A timestamp is represented as a string of the form "YYYY/MM/DD@hh:mm:ssGMT".
Hash Values: A hash value is represented as decimal number followed by a hash character (i.e. `#'.) The reason for distinguishing between the two is for the SVMS test suite where hash values are converted to {hash} prior to comparing golden files.

There are two `standard' header lines and they are:

H "type" major minor: A header line specifies the file type and a major/minor version number pair. The major/minor version numbers are discussed further below.
E: An end line specifies that there is no more data in the tagged data file.

Whenever SVMS encounters a tagged data file with a major version number that is does not match the expected major version number, everything screeches to halt. Whenever SVMS encounters a tagged data file with a minor version number that is less than the expected one, everything halts with a fatal error message. Whenever SVMS encounters a tagged data file with a minor version number that is greater than the expected one, SVMS goes into a mode where it silently skips over lines that are tagged with a letter that is not expected.

Whenever it is time update the revision numbers on the tagged file format files there is a special SVMS command, called svms regen, that will visit all of the SVMS files in a project and update any that are out of date to be to the latest revision number.

Whenever it is time to update a file in tagged file format, the new file is carefully written out in its entirety into a temporary file in the same directory. After the file is closed, it is renamed to overwrite the previous file contents. This ensures that a system crash will never result in an incomplete tagged file format file.

Now it is time to talk about the design of history file format.

History File Format

A history file is a file in tagged file format that contains a bunch of versions and deltas. A version corresponds to a particular instance of a source file. A version does not have any timestamp, project, or ancestor information. A delta specifies a particular version and adds in project, timestamp, and ancestor information. Thus, a history file consists of one or more version segments, where each version segment is referred to by one or more delta segments. In general, whenever a file is propagated to another project, a simple delta is generated to record when the delta showed up in the project (there is one exception.) Whenever a file is modified within a project, a new version is generated.

The entire history of a file is kept in its h. history file and any associated b. bringover file. Bringover files are discussed in greater detail in the section on bringover file format. Basically, a bringover file can contain some extra deltas that indicate when a particular file migrated from the parent project to the child project. At various times, a file's b. bringover file is merged into the file's h. history file and the b. bringover file is deleted. In summary, the entire history of a file can be thought of as the union of its h. history file and its b. bringover file.

To simplify development, history files are going to be stored in a human readable text format (i.e. tagged file format.) While this means that history files will not be as compact as possible, the current low cost of disk space does not warrant getting maximum compression. If necessary, a publicly available compression program like gzip/gunzip can be used to get additional compression.

History file design is meant to be logically thought of as an append only file format. In fact, a whole new history file is created and then renamed into place in order to ensure file consistency. Project merges are the only exception to the append only rule\ and are discussed below.

A history file contains the following tagged lines:

H "SVMS_History_File" major minor

This is the standard header file line.

N "basename" "timestamp"

This is the second line of every history file where "basename" is a string that specifies the base file name and "timestamp" specifies a creation timestamp for the history file.

V offset "type" "nickname" "nickname_timestamp" executable "version_timestamp"

This is a version line that specifies a particular instance of the source file. The fields have the following meaning:

offset

This is an unsigned number that corresponds to the number of the version. The version records are numbered consecutively starting from zero.

"type"

This the file type and it is one of the following:

Binary: The version segment represents a binary file.
Deleted: The version segment represents a deleted file. There is no content associated with a deleted file.
Directory: The version segment represents a directory file. There is no content associated with a directory.
Link: The version segment represents a symbolic link. The content is the value of the symbolic link.
Text: The version segment represents a text file.

"nickname"

This is the project nickname where the version was first created.

"nickname_timestamp"

This is the project timestamp where the version was first created.

executable

This is a logical that specifies whether the execute bit is to be turned on.

"version_timestamp"

This is a timestamp that specifies when this version first occurred.

All version records are sorted by "version_timestamp", "nickname", and "nickname_timestamp" as the primary, secondary and tertiary keys, respectively.

L line_count

{Frankly, this belongs as part of the V record.} line_count specifies the number of T, B, and R records that follow.

T "line_string"

This specifies one line of text followed by a file system specific new-line sequence.

B "binary_string"

This specifies a number of bytes with no new-line sequence.

R "count ranges..."

This specifies a bunch of lines to be expanded. The format is OpVersion:Offset!Span. where Op is either `|' or `@', Version specifies a version number, Offset specifies a a line offset withing a version, and Span specifies then number of consecutive lines to use. None of this will make any sense until an example is explained below.

C range

...

D range

...

X delta_offset version_offset "delta_type" "project_name" "project_timestamp" "delta_timestamp" delta_number "user_name" {other type specific fields}

This is a delta line specifies a file delta -- a version with additional ancestor and timestamp information.

delta_offset

This is the number of the delta in the history file. Delta numbers start at 0.

version_offset

This is the version referenced by this delta. Version numbers start at 0.

delta_type

delta_type is one of:

"Deleted": The `file' is deleted.
"Directory": The `file' is acutally a sub-directory.
"File": The `file' is either an textual or binary file.
"Link": The `file' is a symbolic link.

"project_name"

project_name is the project name that this delta is part of.

"project_timestamp"

project_timetamp is the project timestamp for the project that this delta is part of.

"delta_timestamp"

delta_timestamp is the timestamp for when this `file' showed up.

delta_number

delta_number is the monotonically increasing delta number for the delta. It is always one larger than the maximum of either of its ancestors.

"user_name"

user_name is the user name of the person that created the delta. (This seems wrong! user_name should be part of the version record!)

The following addition fields are present for files:

ancestor_flag: If ancestor_flag is T, this delta has an ancestor and its offset follows immediately in the ancestor_offset field; otherwise, there is no ancestor_offset field. In general, every delta except the first one in a history file has an ancestor.
[ ancestor_offset ]: ancestor_offset is the offset for the ancestor delta. This field is only present if the preceeding ancestor_flag is T.
merge_flag: If merge_flag is T, this delta is the result of a merge of two previous deltas. The first delta is specified by the ancestor_offset and the second delta is specifed by the immediately following merge_offset field; otherwise, there is no merge_offset field. In general, merge deltas occur as a result of a bringover operation.
[ merge_offset ]: merge_offset is the offset for the merge delta. This field is only present if the preceeding merge_flag is T.

The following additional field is present for symbolic links:

"symbolic_link": symbolic_link is the value of the symbolic link. {The symbolic link should be encoded as an data line.}

Each version segment has a unique name that consists a project nickname and history number. Each time a new version segment is created, it uses the project nickname of the current project to identify the version segment. The new history number is equal to the greatest history number in the history file plus one. Version segments are kept in a history file sorted by timestamp followed by project nickname. It is this sorting rule that causes the append only rule to be broken when projects are merged; any intermediate versions from other projects are merged into place properly in the history file to maintain the sort order.

Bringover File Format

A bringover file (i.e. b. file) provides some additional information that belongs in the history file (i.e. h. file), but has not been merged into the history yet. A bringover file basically records whenever a version of a file has migrated from the parent project to the child project. Whenever multiple versions of a file have been brought over from the parent project to the child project via multiple bringover commands, for each bringover, there will be a corresponding record in the bringover file.

The reason for having a bringover file is a little subtle. Basically, I want a bringover followed by a putback to result in only the files that changed being putback. If I record when the file shows up in a project as a delta in the appropriate history file, its contents will change and a the next putback will have to merge the changed contents back up to the parent project; this would occur for every file in the project (not what I had in mind!) So, instead, I record when the bringover occured is a small separate file called the bringover file that does not effect the contents of the history file.

The next question is `why not store the timestamp in the p.listing file and forget about a separate b. file?' The answer to this question is because I want to be able to regenerate p.listing files by inspection of the the SVMS directory tree. That way p.listing files can be damaged (a not unthinkable occurance) and still have a viable recovery technique.

When do b. files get deleted? Basically, b. files get deleted whenever the associated history file contents are augmented. This can occur when a file opened and closed for editing, or whenever a merge occurs, etc. Also, there will be an operation (probably "svms parent -e" and/or "svms parent -u" that allows the project owner to manually force all bringover files to be merged into their associated history files. The reason for forcing the bringover files to be eliminated is to ensure that complete `snapshots' of the child project exist in the parent project. If the parent and child projects exist on different machines, this is a very effective form of backup.

A bringover b. is a tagged file containing the following records:

H "SVMS Bringover" major minor

This is the standard header file record.

F "file_name" count

This is the file record and it has the following fields:

"file_name": file_name is the file name for the file.
count: count is the number of bringover records in the bringover file.

B parent_history_hash "parent_project_name" "parent_project_timestamp" parent_delta_number "parent_delta_timestamp" "child_project_name" "child_project_timestamp" child_delta_number "child_delta_timetamp" "parent_delta_timestamp"

There is bringover record and there is one of these records for each bringover that brings over a different file version. This record has the following fields.

parent_history_hash: parent_history_hash is the hash value of the parent's h. history file.; parent_project_name is the project name for the parent.
"parent_project_timestamp": parent_project_timestamp is the parent project tiemstamp.
parent_delta_number: parent_delta_number is the delta number (not offset!) for the delta that is being brought over.
"parent_delta_timestamp": parent_delta_timestamp is the timestamp of the delta that is being brought over.
"child_project_name": child_project_name is the child project name.
"child_project_timestamp": child_project_timestamp is the timestamp of the child project.
child_delta_number: child_delta_number is the new delta number (not offset!) for the child delta.
"child_delta_timestamp": child_delta_timestamp is the new timestamp of the delta that was brought over.

E

This is the standard end record.

Lock File Format

A l. lock file is used to indicate when a file is open (i.e. writable) for editing. It basically just contains as timestamp for when the open operation occured and the name of the person who did the opening.

A lock file consists of the following records:

H "SVMS Lock File" major minor

This is the standard header file record.

U "file_name" "user_name" "open_timestamp"

This is the user record that identifies who opened the file and when. The user record has the following fields:

"file_name": file_name is the file name of the open file.
"user_name": user_name is the user name of the person who opened the file.
"open_timestamp": open_timestamp specifies when the file was opened for editing.

E

An example l. lock file looks as follows:

H "SVMS Lock File" 1 0
U "Makefile" "Wayne Gramlich" "1997/01/04@22:14:13GMT"
E

`p.listing` File Format

A p.listing file is basically a directory listing. The purpose of a p.listing file is three-fold:

The p.listing file supports bringover/putback operations via HTTP. HTTP has no well defined directory format. By storing a directory format in a well defined file format, it is possible to fetch the information via HTTP.
The p.listing file supports the rapid determination of differences between projects. This is accomplished by computing a hash value associated with each history file and sub-directory. Whenever the hash value is different between two projects, there is work to be done by the bringover/putback command.
The p.listing file supports lazy bringover by marking an entry as being lazy. If the lazy flag is present, it means that the actual history file is located in the parent chain somewhere. This can save an enormous amount of disk space.

A p.listing file contains the following tagged lines:

H "SVMS Listing" major minor

This is the standard header file line.

C count

This is the count of the number of entries in the file.

F "file_name" "file_type" history_hash history_present bringover_hash conflict_hash lock_hash directory_hash [ bringover_total conflict_total directory_total history_total lock_total ]

The file record specifies information about a particular file name. File records are sorted alphabetically by the file_name field. The file record has the following fields:

"file_name"

file_name is the file and/or directory name.

"file_type"

file_type is the latest file type for the file as specified in the h.file_name history file. It may have one of the following values:

"Derived"
"Directory"
"Deleted"
"File"
"Link"

history_hash

history_hash is the hash value of the h.file_name history file contents. If there is no history file present, history_hash will be the hash value of the appropriate history file from the parent chain.

history_present

history_present is T if there is a history file present; otherwise, it is F and the history file is somewhere in the parent chain.

bringover_hash

bringover_hash is the hash value of the b.file_name bringover file contents. When a b.file_name file exists, it indicates that the corresponding h.file_name eventually needs to have a bringover delta inserted into it. If bringover_hash is zero, there is no b.file_name.

conflict_hash

conflict_hash is the hash value of the c.file_name conflict file. A c.file_name file is a history file from the parent that needs to be merged into the corresponding h.file_name history file. If conflict_hash is zero, there is no c.file_name conflict file.

lock_hash

lock_hash is the hash value of the l.file_name lock file contents. When a l.file_name file exists, it indicates that the corresponding h.file_name is open for editing. If lock_hash is zero, there is no l.file_name lock file.

directory_hash

directory_hash is the hash value of the p.listing file in the d.file_name sub-directory. If directory_hash is non-zero, the bringover_count, conflict_count, directory_count, history_count, lock_count fields will immediately follow. If there is no d.file_name sub-directory, directory_hash will be zero.

[ bringover_total ]

bringover_total is the total number of b. bringover files in the b.file_name sub-directory tree.

[ conflict_total ]

conflict_total is the total number of c. bringover files in the c.file_name sub-directory tree.

[ directory_total ]

directory_total is the total number of d. conflict files in the d.file_name sub-directory tree.

[ history_total ]

history_total is the total number of h. lock files in the h.file_name sub-directory tree.

[ lock_total ]

lock_total is the total number of l. lock files in the l.file_name sub-directory tree.

S bringover_count conflict_count directory_count history_count lock_count

This is a summary record and is the second to last record before the E record. The summary record has the following fields:

bringover_count: bringover_count is the number of b. files in the directory only.
conflict_count: conflict_count is the number of c. files in the directory only.
directory_count: directory_count is the number of d. files in the directory only.
history_count: history_count is the number of h. files in the directory only.
lock_count: lock_count is the number of l. files in the directory only.

T bringover_total conflict_total directory_total history_total lock_total

This is a total record and is the last record before the E record. The contents of the total record are reflected one level up in the corresponding file record in the parent p.listing file. The total record has the following fields:

bringover_total: bringover_total is the number of b. files in the directory tree.
conflict_total: conflict_total is the number of c. files in the directory tree.
directory_total: directory_total is the number of d. files in the directory tree.
history_total: history_total is the number of h. files in the directory tree.
lock_total: lock_total is the number of l. files in the directory tree.

E

This is the standard end record.

An example p.listing file looks as follows:

H "SVMS Listing" 1. 3.
C 3
F "include" "Directory" #123. #0. #0. #0. #234. 0. 0. 0. 3. 1.
F "src" "Directory" #345" #0. #0. #0. #456. 0. 0. 2. 4. 2.
F "Makefile" "Text" #567. #0. #0. #0. #0.
S 0. 0. 2. 3. 0.
T 0. 0. 4. 10. 3.
E

{Talk about why the p.listing file contains then infomation that it does.}

`p.project` File Format

{p.project file format goes here.}

make Integration

I am not a big fan of make; however, until I get around to replacing make with something better, I have to live with make's peculiarities. The make integration described below is not intended to be elegant; instead, the integration is intended to be just sufficient to get the job done.

There is a problem that I call the `cold start problem' where make does not interact well with a project management system. It is desirable to be able to go into empty project sub-directory and fire off make and have everything be properly checked out and built. Unless care is taken to properly design the the Makefile, it is rarely the case that a cold start will succeed.

There are some sub-problems with the `cold start problem':

Direct Dependencies: The direct dependency problem is to ensure that the source file needed to build a source is properly checked out. For SCCS, some special code is built into make to automatically (via magic) gets a source file prior to building it. Since, SVMS can not assume the ability to make changes to make, another alternative is needed.
Implicit Dependencies: The far nastier problem concerns header files. Most compilers will die horribly if a needed header file is not present at compile time. Thus, it is necessary to identify all needed header files prior to initiating a compile. By far the simplest solution to the header file cold start problem is to use a programming language that does not use them (e.g. Java or STIPPLE.)
Makefile Fragment Dependencies: Some of the problems with make can be worked around by generating makefile fragments that can be included into the Makefile via an include directive. However, the make program will fail if these makefile fragments are not present when make is run.

SVMS uses the same basic strategy to solve the first two cold start sub-problems -- namely, a makefile fragment it generated that lists the necessary dependencies. The user then just includes the makefile fragment at the beginning of the Makefile and the regular dependency tracking code inside of make does the rest. The solution to the Makefile fragment dependency problem is to automatically (via magic) generate the Makefile fragment at directory creation time.

The svms makefile command is used to make a makefile fragment. The basic file format looks as follows:

    SVMS=svms
    PROJECT_DIR=../../..
    SVMS_DIR=$(PROJECT_DIR)/SVMS
    DELTA_DIR=$SVMS_DIR/d.a/d.b/d.c

    # Source file dependencies:
    file1: $(DELTA_DIR)/f.file1 .../fileA.h .../fileB.h ... .../fileC.h
	$(SVMS) get $@
    file2: $(DELTA_DIR)/f.file2 .../fileD.h .../fileE.h ... .../fileF.h
	$(SVMS) get $@
    ...
    fileN: $(DELTA_DIR)/f.fileN .../fileX.h .../fileY.h ... .../fileZ.h
	$(SVMS) get $@

    # Implicit dependencies:
    .../fileA.h: $(SVMS_DIR)/.../f.fileA.h
	$(SVMS) get -R $@
    .../fileB.h: $(SVMS_DIR)/.../f.fileB.h
	$(SVMS) get -R $@
    ...
    .../fileZ.h: $(SVMS_DIR)/.../f.fileZ.h
	$(SVMS) get -R $@

Since SVMS history format is being designed from the ground up to directly support projects, it has a place to store the implicit dependency file list each time a source file is checked in. {These records need to be defined.} The implicit dependencies are determined by examining .make.state, if it is present, and by running makedepend otherwise.

Finally, the svms get command will attempt to preserve the relative order of modification timestamps so that make does not get terribly confused.

Summary

That is about it for the design of SVMS. It certainly meets the primary requirement of working on Linux.

New Command Syntax Notes

Below are my notes on improving the command line syntax for SVMS.


Project Commands

    svms new [-R] {path}
    svms parent {path}
    svms signature
    svms bringover {slice}...
    svms putback {slice}...

    Example "sneaker-net" bringover:
	# On child machine:
	rm -rf /tmp/sneaker
	svms create -R /tmp/sneaker sneaker
	cd /tmp/sneaker; svms parent -d /tmp/sneaker -s /tmp/signature
	cd {child}; svms bringover -???
	cd /tmp; tar cvf /dev/rst4 signature

	# On parent machine:
	rm -rf /tmp/sneaker
	svms create -R /tmp/sneaker sneaker
	cd /tmp/sneaker; svms parent -d {parent} -s /tmp/signature
	cd /tmp/sneaker; svms bringover :/...
	cd /tmp/sneaker; tar cvf /tmp/sneaker.tar .

	# On child machine:
	cd /tmp/sneaker; tar xvf /dev/rst4
	cd {child}; svms bringover :/...

    Example "sneaker-net" putback:
	...


    Example "HTTP" bringover:
	svms parent -h http://host_url/path/ -p http://proxy_url:8080/
	svms bringover {slice}...

    Example "NFS" bringover
	svms parent -d /net/foo.org/path/ -e
	svms bringover

File Commands:

    svms create {file}...
	[-c {file} | -C previous | -C pipe | -C empty | -C prompt*]
	[-d {file} | -D previous* | -D pipe | -D empty | -D prompt]
	[-i {file} | -I previous | -I pipe | -I empty | -I file*]
	[-m binary | -m text | -m link | -m guess*] [-f]
	[-t {timestamp} | -T now*]
	[-u {user} | -U current*]
	[-f] [-k]
	{file}...
    svms close {...}

    svms open
	[-c {file} | -C pipe | -C none*]
	[-d {file} | -D pipe | -D none*]
	[-o {file} | -O pipe | -O file*]
	[-v {number} | -V latest*]
	[-n {nickname} | -N current* | -N parent]
	[-k] {file}...
    svms get


    svms delete file...
    svms undo file...

    svms rename [] old_file new_file
    svms copy [] old_file new_file

    svms history file...
    svms diff file...

    Options:
	-f		Force acceptance of symbolic link.
	-k		Do not substitute SVMS variables.

	-m binary	Force all files into binary mode.
	-m text		Force all files into text mode.
	-m link		Force all files into symbolic link mode.
	-m guess	Guess what type of file it is.

	-n {nickname}	Project nickname
	-v {number}	Obtain version number {number}.

Directory Commands:

    svms mkdir [-xxx] directory...
    svms rmdir [-xxx] directory...


Other commands:

    svms tell {slice}...
    svms generate [-m html | -m text] {slice}...