This is one of the STIPPLE Documentation pages.

STIPPLE Lexical Issues

Before proceeding with a top-down definition of the language, it is necessary to address the lexical conventions of STIPPLE. STIPPLE has six kinds of lexical tokens: white-space, comments, identifiers, keywords, constants, and operators.

White-space

White-space is defined as spaces and tabs outside of string constants. White-space may occur between any lexical token. White-space at the beginning of the line is used to determine the indentation level of the first token on each line. (see Why use indentation.)

New-line characters ("\n") are used to terminate statements and declarations. Lines with no visible characters are completely ignored provided they do not occur in the middle of a continuation line. A statement or declaration can be continued to multiple lines provided that:

all lines except the last one end in either a binary operator (e.g. ",", "+", "*", etc.) or an open bracket (e.g. "(", "["), and
all lines after the first one have an indentation that is strictly greater than the first line.

For example,

    a := (b + c) *	# Legal multi-line statement
	(d + e)

is a legal multi-line comment, but

    a := func(a, 	# Illegal multi-line statement
b, c)			# Indentation must be greater (not lesser)

and

    a := (b + c) *	# Illegal multi-line statement
    (d + e)		# Indentation must be greater

and

    a := (b + c) *	# Illegal multi-line statement

	(d + e)		# Blank line between continuation line

are illegal multi-line statements.

When the STIPPLE compiler is being used to convert a program from one language to another, all white-space is preserved exactly.

Comments

STIPPLE supports both in-line and end-of-line comments. In-line comments may occur anywhere white-space may occur and end-of-line comments may occur anywhere a new-line may occur. All comments start with a sharp ("#") character, followed by a comment control character (see Why use sharp ("#") for comments.)

{In-line comments are not implemented yet.}

In-line comments start with "#<" and end with ">#" and must reside entirely on one line. The "#<" and "#>" may neither be nested nor used as translation hints.

End-of-line comments have one of the following comment control characters:

A "#" followed by either a space, tab, or new-line is treated as a regular comment.
Translation hint (`~'): When an application is being internationalized, translation hint comments ("#~") are extracted by the compiler along with string literals to be translated. The person performing localization uses the translations hints to help translate the string literals.
Documentation (`:'): When the compiler is used in documentation mode, documentation comments ("#:") are extracted into the resulting documentation.
Error (`!'): When the compiler is being used with some word processors, errors are merged back into the source code using "#!" comments.

When a comment does not conveniently fit on one line, it is continued to multiple lines using by having the additional lines start at a greater indentation that the first line of the comment. Blank lines in the middle of comment continuations are completely ignored.

Comments frequently make references to identifiers in the program. When the compiler is being used in translation mode, it is desirable to treat at the program identifiers in comments separately from the rest of the comment (see Why identify program identifiers in comments). Program identifiers inside of comments are bracketed with braces (`{', `}') and no intervening white space. In word processors, the program identifiers will probably be additionally emphasized by a font change.

Here are some examples of good comments:

    # Normal vanilla single line comment
    # Reference to program variable {widgets_remaining} in a comment
    #~ Translation hint comment
    #: Documentation comment
    # A comment containing
      three
      lines.
    #~Multi-line
      translation hint
    #: Multi-line
       documentation comment

Identifiers

STIPPLE identifiers are a sequence characters consisting of letters ("a"-"z", "A"-"Z", and any other Latin-1 letters), underscores ("_"), and digits ("0"-"9"). The first character of an identifier must be a letter. Underscores are meant to be used as word separators in multi-word identifiers; for this reason, identifiers are further restricted so that they do not end in an underscore and do not have two or more adjacent underscores.

STIPPLE requires that an identifier always be capitalized the same way. Whenever STIPPLE detects that an identifier has been capitalized differently, the identifier is flagged with a warning message.

Here are examples of good identifiers:

    red
    widgets_remaining
    a_very_very_very_very_very_very_long_identifier
    index_2
    Pepé		# Variable with embedded Latin-1 letter

Here are some examples of bad identifiers:

    1_2_3		    # Identifiers must start with a letter
    double__underscore	    # Variable with two underscores in a row
    trailing_underscore_    # Variable with trailing underscore
    _leading_underscore	    # Variable with leading underscore

Keywords

STIPPLE has different keywords depending upon the native language being used The keywords are always full words and phrases (see Why use full word keywords.) For English, the keywords are:

	body
	break
	case
	continue
	default
	define
	else
	else_if
	enumeration
	evaluate
	extract
	if
	iterator
	loop
	module
	procedure
	record
	return
	routine
	signal
	signals
	switch
	tag
	takes
	type
	until
	variant
	variables
	while
	yield
	yields

Keywords in STIPPLE are not reserved; all keywords may also be used as variable and routine names (see Why keywords are not reserved.) This is accomplished by structuring the STIPPLE grammar so that it can always determine whether an identifier is being used as a keyword or not.

Constants

STIPPLE has four kinds of constants: integers, floating-point numbers, characters, and strings.

Integer constants

Integer constants may be specified in decimal, octal, or hexidecimal. Decimal constants start with the digits one through nine ("1"-"9") followed by a sequence of decimal digits ("0"-"9"). Octal constants start with the digit zero ("0") followed by a sequence of octal digits ("0"-"7"). Hexadecimal constants start with the zero ("0"), followed by either the letter "x" or "X", followed by a sequence of hexadecimal digits ("0"-"9", "a"-"f", "A"-"F"). A variable can not immediately follow an integer constant without some separating white-space. All integer constants are positive and are of type unsigned64. The unary plus operator ("+") is used to automatically cast the integer constants to other types with lower resolution.

Some examples of valid integer constants are:

    0		# Zero
    189		# Decimal
    067		# Octal
    0xaef	# Hexadecimal
    0xAEF	# Hexadecimal
    0XaEf	# Hexadecimal (pretty ugly)

Some examples of invalid integer constants are:

    089		# Bad octal digits
    89af		# Bad decimal digits
    0x89cat		# No white-space separating number from variable {cat}

Floating-point constants

a Floating point constants are a sequence of decimal digits ("0"-"9") with exactly one embedded decimal point (".") and optionally followed by a mantissa in "E" format. "E" format is either the letter "e" or "E" followed by an optional sign, either "-" or "+", followed by no more than three decimal digits ("0"-"9"). Word processors may be able to represent scientific notation more naturally than "E" format (e.g. ), in which case the convert filter for the word processor will properly translate it for the compiler.

Some examples of valid floating-point constants are:

Some examples of invalid floating-point constants are:

    10e-10		# No decimal point
    10.e-1000		# More than three digits in mantissa
    3.5.3 String literals

String literals are enclosed in either single quotes ("'") or double quotes ("""). Word processors are permitted to use matching quotes to enhance readability (e.g. a `single-quoted string' and a "double-quoted string".) Double quoted strings are literal and are included in the application exactly as specified by the programmer (i.e. without any translation.) Single quoted strings are translated when the application is localized during internationalization.

Strings may only contain spaces and printing characters. In particular, neither tabs nor new-line characters may be directly embedded in string constants. The backslash (`\') character is used as a quoting character according to the following table:

Character	Pattern
New-line	\n
Tab	\t
Backspace	\b
Carriage-return	\r
Form feed	\f
Backslash	\\
Single quote	\'
Double quote	\"
Octal character	\ddd

The octal character specifies an octal number with up to three octal digits.

{Talk about 16-bit and 32-bit code sets.}

Some examples of valid strings are:

    ""			# Empty string
    "Hello"		# Non-empty string
    "Hi!\tBye!\n"	# String with a tab and new-line
    "\33\0333"		# Two escape characters (`\033') followed by digit `3'
    `Error'		# Translated string

Some examples of invalid strings are:

    "			# No closing quote
    "\x"		# Bad backslash character

Operators and Punctuation

STIPPLE recognizes the following operators a lexical tokens:

!	-	++	--	??	.
+	:+=	::+=	&	:&=	::&=
|	:|=	::|=	^	:^=	::^=
/	:/=	::/=	<<	:<<=	::<<
*	:*=	::*=	%	:%=	::%=
>>	:>>=	::>>=	-	:-=	::-=
:=	::=	&&	||	?	:
=	!=	>	>=	<	<=
**	:**=	::**=	~	,	@
(	)	[	]	::

All other combinations of punctuation characters are invalid in STIPPLE.

From here you can go to either the next chapter on notational issues or back to the table of contents.