Lexical elements

The lexical elements of the language are its tokens — the terminals from which all syntax is built (Notation). This chapter defines every token.

Token kinds

The token kinds are:

Token Written Description
word bareword a run of word characters; also \; and a lone \; also />
int integer an optional +/- sign and one or more digits
bool true/false the two reserved boolean literals
' '...' delimits a literal string
" "..." delimits an interpolated string
[ / ] [...] command substitution
( / ) (...) list
{ / } {...} block
< / > <...> block parameter-list
$ $ variable dereference sigil
& & command dereference sigil
newline \n, ; statement separator
eof end of input

Integer literals

An integer literal is an optional + or - sign followed by one or more decimal digits. It is its own token (int); it takes precedence over barewords and consumes the longest run of digits. Leading zeros are allowed (007 is 7); integer values are base-10 and 64-bit signed.

int = [ "+" | "-" ] digit { digit } .

Because integer literals take precedence, a bareword can never begin with a digit. An integer literal must be followed by a token boundary: whitespace, end of input, a delimiter ([ ] ( ) { } < >), a separator (;), or a comment (#). Any other character immediately after the digits is a compile-time error — for example 3rd, 5x, 34$foo, 0_bar, and 5-3 are all illegal. A sign is part of an integer only when a digit follows it — a lone -/+, or a sign before a non-digit (-foo), is an ordinary bareword.

Boolean literals

The barewords true and false are reserved: they lex as the bool token rather than word, and denote the two boolean constants. Being reserved at the lexical level, they cannot be used as identifiers (variable, parameter, or def/const names). Quoting escapes the reservation — 'true' and "true" are the ordinary string true (Constants).

Barewords

A bareword is a maximal run of word characters that does not begin an integer literal — integer literals take precedence, so a bareword never starts with a digit (and never with a sign immediately followed by one). A character is a word character iff it is an ASCII letter or digit or one of:

_  -  .  !  ?  *  +  /  %  =  |  ,  :

The following characters are not word characters; each begins another token or is otherwise special: < > & ( ) { } [ ] $ " ' ; #, backslash, and whitespace. The backtick ` is reserved; it is also not a word character.

WordChar = letter | digit | "_" | "-" | "." | "!" | "?" | "*" | "+"
         | "/" | "%" | "=" | "|" | "," | ":" .
bareword = WordChar { WordChar } .   # but does not begin an int literal

The following forms are recognized as special-case word tokens:

Statement separators

An unescaped newline and an unescaped semicolon are equivalent: both produce a newline token. They separate statements within a body.

do 'wave'; do 'smile'      # two statements on one line
do 'wave'
do 'smile'                 # the same, separated by newlines

A backslash immediately before a line terminator is a line continuation (both removed; see Source code representation).

Comments

# introduces a comment that runs to the end of the line (Source code representation — Comments).

String literals

There are two string forms. Double-quoted strings interpolate; single-quoted strings are literal.

Double-quoted (interpolated) strings

"..." performs substitution: the literal text between the quotes is interleaved with the variable and command substitutions described below. The recognized escapes are:

Escape Meaning
\\ backslash
\" double quote
\$ literal $ (suppresses variable subst.)
\[ literal [ (suppresses command subst.)
\n line feed
\t tab

Any other \X is kept literally as backslash and X. Two substitution forms are recognized inside the double quotes:

"say hello [name $actor]"
"you have ${count} coins"
"a literal \$ and \[ stay put"

A double-quoted string with no substitutions is equivalent to the plain '...' literal with the same characters (Expressions). An unterminated "..." is a compile-time error.

Single-quoted (literal) strings

'...' is a flat literal with no substitution. It produces a single string token. The recognized escape sequences are:

Escape Meaning
\\ backslash
\' single quote
\n line feed
\t tab

Any other \X sequence is kept literally as the two characters backslash and X — the backslash is not consumed. In particular, $ and [ inside '...' are ordinary characters; no escape is needed to suppress them. A raw newline inside the quotes is part of the string. An unterminated '...' is a compile-time error.

'a plain string'
'don\'t and a tab\t'
'C:\path'              # backslash kept literally: C:\path
'price is $5 [really]' # $ and [ are literal here

Variable and command sigils

VarRef     = "$" word | "$" "{" word "}" .
CommandRef = "&" word .

Brackets, braces, parens, angles

[ ] { } ( ) < > are delimiter tokens. A {...} region is ordinary source — not opaque text — and brace nesting is matched structurally.

Reserved literals; no reserved command words

The only tokens reserved at the lexical level are the literals: an integer literal and the boolean words true/false lex as int/bool rather than word, so they cannot be used as identifiers — variable, parameter, or def/const names. (Quoting escapes the reservation, but a quoted form is a string, not a name.)

Command and control-flow words are not reserved. Names such as def, const, if, each, before, or, elif, else are ordinary barewords; their meaning is determined by position in the source (see Declarations and scope and Expressions). Restrictions on which names may be used in def declarations are specified in Declarations and scope.