The Separated HTML Writing Style

Until someone produces a good WYSIWYG (What You See Is What You Get) HTML (HyperText Markup Language) editor, most HTML will be created manually via an editor with a fixed width font. While the commands associated with HTML are fairly spartan, when they are intermixed with the text being marked up, the result tends to be somewhat difficult to read and understand. In order to address the problem of understanding the underlying HTML file, I write my HTML documents in a style that I call separated HTML.

As the name indicates, separated HTML tries to keep most of the HTML commands separate from text being marked up. In separated HTML, most (but not all) HTML commands are placed 8 tab stops over in columns 64 through 80 and the text being marked up is kept in columns 1 through 63. The choice of ending HTML commands in column 80 can ultimately be traced back to the fact that the Hollerith punched card, which was invented over a century ago, had 80 columns. In fact, a large number people size their fixed width editors so that no line wrapping occurs on lines of 80 characters or less. Whenever an HTML command is too big to fit in columns 64 through 80, it is right justified so that the last character of the command occurs in column 80.

Obviously, authors are free to write and format their HTML any way that they want. If you do not like the separated HTML style, please publish your own HTML style document. Remember, the whole issue of HTML formatting styles becomes moot the moment a decent WYSIWYG HTML editor becomes available.

A further goal of this document is to provide a fairly stand alone document about HTML. Indeed, someone who is already fairly comfortable with document formating via a marked up text file might be able to learn HTML from this document alone.

Without any further discussion, the rest of this document describes the separated HTML writing style.

A Review of SMGL

If you are already famliar with SGML (Standard Genneralized Markup Language), please go directly to the next section.

Although SGML was invented over 20 years ago, it is rapidly being adopted by the document generation and publication industry as the standard way of representing the structure of documents. SGML specifies the syntax for the markup commands in the document. While the syntax of SGML commands is standardized, the commands themselves vary between document production systems. The set of commands that a particular document production system accepts is a DTD. This document will only describe the commands associated with HTML.

A document in SGML format consists of text with SGML commands and entities intermixed. Both SGML commands and SGML entities are discussed shortly. With a few exceptions, spaces, tabs, and new-line characters are treated as white space that separates words. Indeed, the reason why separated HTML works all is because all of the tabs used to push the HTML commands over to column 64 are almost always ignored.

SGML commands are enclosed in angle brackets (i.e. <Command>.) SGML is insensitive to the case of the command (e.g. <Command>, <COMMAND>, <command>, and <CoMmAnD> are all treated as equivalent.) To improve legibility, it is useful to adopt a standard capitalization strategy for your SGML commands. While I usually capitalize the first letter of each word in my SGML commands, you are free to use whatever looks pleasing to you.

Many SGML commands are used to bracket a region of text. These commands occur in pairs of the following form -- <Command> some text </Command>.

Some SGML commands require additional information, called command attributes. Command attributes occur after the command, but before the closing angle bracket. Some examples of command attributes are <Command Attribute1 Attribute2=Word Attribute3="String">.

Finally, SGML has a syntax, called enity syntax, for representing characters that are not in the ASCII character set. The syntax of an enity is `&Enity;', not including the quotes. The trailing semi-colon can be dropped if the following character is not a letter. Enities are needed to express both amphersands and less than characters since these characters are used by HTML.

This concludes the brief review of SGML. The details of separated HTML follow.

The Document Head

An HTML document has both a head and body. This section discusses the head of an HTML document and the next section starts discussing the body of an HTML document.

The head of an HTLM document consists of a <Head> command followed by a document title. The beginning of the document title is marked with the <Title> command and the end is marked with the </Title> command. For example, the head of the document you are reading is specified by the following three lines:

						<Head>
						<Title>
	Separated HTML
						</Title>

</Head> Please note that all of the HTML commands have been moved over to column 64.

The non-separated HTML way of representing the same document head is:

	<Head>
	<Title>Separated HTML</Title>
	</Head>

The example above is the only example of non-separated HTML you will see in this document, because this document is about separated HTML.

Most HTML document viewers display the document title irrespective of which portion of the document is visible. Given that the document title is almost always visible, it is generally a good idea to keep the titles short. The first heading of the document (described in the next section) is a good place to put a more descriptive title.

The Document Body and Headings

The document body starts with <Body> command.

Most people start the body of their documents with heading which is either identical to the document title or somewhat more descriptive. The <H1> command sepecifies the beginning of the heading and the </H1> command marks the heading end.

For example, the first few lines of the document body for the document you are reading are:

						<Body>
						<H1>
	The Separated HTML Writing Style
						</H1>

Note that `The Separated HTML Writing Style' is a bit more descriptive than the document title -- `Separated HTML'.

HTML has six levels of heading commands from <H1> though <H6>. In general, I only use one <H1> command at the beginning of my document and use the <H2> through <H6> commands for all my other headings.

Named Anchors

The anchor command (i.e. <A>) is used for hypertext links. In general, there are two forms for the <A>: command -- the named anchor and the hypertext reference. This section is about the named anchor form of the <A> command; the hypertext reference form of the <A> command is discussed in a section much further below.

What is a named anchor? The short answer to this question is that a named anchor allows is the potential target of zero, one, or more hypertext references. A named anchor allows a hypertext link to jump into the middle of a document, rather than to just its beginning.

A named anchor is one that has the `Name=...' attribute in it. A named anchor looks as follows:

				    <A Name="...">

The <A Name=...> command is used to anchor the ends of a hypertext link. The `Name=...' attribute provides a name for the anchor. In separated HTML, a named anchor is always on a line of its own and it is started in column 64. If you like long descriptive names in your named anchors, indenting them over to column 64 will not work, so instead they can be indented over to column 4.

The whole need for named anchors would be pretty much obviated if URL's (Universal Resource Locators) were extended to permit search strings. For example, http://info.cern.ch#"CERN" would search for the first occurrance of the word `CERN' in the http://info.cern.ch document.

Titles, Headings, Paragraphs, Itemized Lists, Etc.

Paragraph breaks are generated by placing `<P>' at the end of a paragraph in column 64 immediately followed by a named anchor for the next paragraph or heading on the next line. An example paragraph break looks as follows:

	...
	Last sentence of
	previous a paragraph.
						<P>
	First sentence of
	next paragraph.
	...

Remember, paragraph breaks are only needed to separate paragraphs; all other headings, lists, etc., have an implicit paragraph break.

A section heading is done as follows:

						<H1>
	Heading
						</H1>
	The first sentence of
	first paragraph in new heading.
	...

Since a paragraph always follows a heading, there is always a named anchor for the paragraph. Any one of <H2>, <H3>, <H4>, <H5>, or <H6> can be substituted for <H1> to get successively smaller section headings. In general, there should only be one <H1> header at the very beginning of the document.

An unnumbered list (with bullets) is done as follows:

	...
	Last sentence of previous paragraph.
						<UL><LI>
		First item/paragraph.		<LI>
		Second item/paragraph.		<LI>
		...
		Last item/paragraph.		</UL>
						<P>
	First sentence of next paragraph.
	...

Note that all lines of each item/paragraph are indented by a tab to set the list off from other surrounding paragraphs. An ordered (i.e. numbered) list is accomplished by substituting <OL> for <UL>. A more compact menu list is achieved by substituting <MENU> for <UL>. A multi-column list is achieved by substituting <DIR> for <UL>.

A definition list looks as follows:

	...
	Last sentence of previous paragraph.	<DL><DT>
	    First term				<DD>
		First description.		<DT>
	    Second term				<DD>
		Second description.		<DT>
	    ...
	    Last term				<DD>
		Last description.		</DL>
	First sentence of next paragraph
	...

Note that each term is indented by four spaces and each descriptive paragraph is indented by a tab to set the terms and descriptions off from the surrounding paragraphs.

An example in a fixed-width font with text with spaces, tabs and new-lines preserved is done as follows:

	...
	Last sentence of previous paragraph.
						<Pre>
		Some fixed with text
		...
		Some more fixed width text.
						</Pre>
	First sentence of next paragraph.
	...

As usual, the example is indented by one tab stop to set it off from the surrounding paragraphs. A quotation is achieved by substituting <BlockQuote> for <Pre>. An address that indented to the center of the page is obtained by substiuting <Address> for <Pre>. As is proper HTML etiquette, each document should end with an <Address> command that contains enough information to get in touch with the author to correct any errors; this is usually accomplished via a hypertext link to the author's signature page.

An HTML comment that will not be displayed is done as follows:

						<!--
	Comment text goes here.
	More comment text.			--!>

Non-right justified HTML

Some HTML commands are not worth right justifying since they occur in the middle of sentences. If these commands were right justified, the resulting text is frequently very ragged and is pretty annoying to read. These HTML commands are special characters entities and font control commands. They are listed below:

<B> text </B>: A bold font.
<I> text </I>: An italic font.
<U> text </U>: Underlined characters.
<TT> text </TT>: A typewriter (i.e. fixed-width) font.
<EM> text </EM>: An emphasis (i.e. italics) font.
<Dfn> text </Dfn>: A definition (i.e. bold) font.
<Strong> text </Strong>: A strong emphasis (i.e. bold-italics) font.
<Code> text </Code>: An in-line code (i.e. fixed-width) font.
<Samp> text </Samp>: An in-line sample (i.e. fixed-width) font.
<Kbd> text </Kbd>: A font for representing keyboard input (i.e. fixed-width font) in-line.
<Var> text </Var>: A font for representing a variable (i.e. an italic font) in-line.
<Cite> Citation </Cite>: A font for representing a citation (i.e. an italic font) in-line.
<: A less-than (i.e. `<') character.
>: A greater-than (i.e. `>') character.
&: An amphersand (i.e. `&Amp;') character.
": An double quote (i.e. `"') character.
&latin-1;: A latin-1 character. For example, ö provides an `ö' umlaut (i.e. an `o' with two dots over it.)

Whenever possible you should use the logical font controls (e.g. <Em> and <Dfn>) instead of the manual controls (e.g. <B> and <I.) Since `<', `>', and `&' are treated specially by HTML, these characters can be entered into the document using the entity syntax (e.g. `&'.) Since many text editors have still not been modified to accept Latin-1 characters, HTML provides entity syntax for entering them in ASCII. Not all HTML viewers are able to display Latin-1 characters. The list of Latin-1 characters can be found here .

Adding links to other documents

HTML hypertext links to other documents tend to be fairly long; in addition, any URL's (i.e. Uniform Resource Locators) can not be broken across lines. Thus, hypertext links are tend to be difficult to right justify.

Hypertext links in the middle of sentences and paragraphs are done as follows:

	...
	Soon most HTML viewers will be capable
	of displaying all of the
	    <A HRef=http://info.cern.ch/...>
	HTML Latin-1 entity bindings </A>.  It
	will be much longer before HTML can deal
	with Asian character sets.
	...

The anchor (i.e. hypertext link) reference containing the URL is on a line of its own indented by four spaces. The closing `</A>' that marks the end of the text to be sensitive to hypertext links and is just inserted into the text like any of the font control HTML commands.

Conclusions

This document has gone through the HTML commands and shown how the they can be arranged according to the Separated HTML style guidelines to improve the legibility of the text contained in the underlying raw HTML text file.