Lexical elements
The lexical elements of the language are its tokens — the terminals from which all syntax is built (Notation). This chapter defines every token.
Token kinds
The token kinds are:
| Token | Written | Description |
|---|---|---|
word |
bareword | a run of word characters; also \; and a lone \; also /> |
int |
integer | an optional +/- sign and one or more digits |
bool |
true/false |
the two reserved boolean literals |
' |
'...' |
delimits a literal string |
" |
"..." |
delimits an interpolated string |
[ / ] |
[...] |
command substitution |
( / ) |
(...) |
list |
{ / } |
{...} |
block |
< / > |
<...> |
block parameter-list |
$ |
$ |
variable dereference sigil |
& |
& |
command dereference sigil |
| newline | \n, ; |
statement separator |
| eof | — | end of input |
Integer literals
An integer literal is an optional + or - sign followed by one or more
decimal digits. It is its own token (int); it takes precedence over
barewords and consumes the longest run of digits. Leading zeros are
allowed (007 is 7); integer values are base-10 and 64-bit signed.
int = [ "+" | "-" ] digit { digit } .
Because integer literals take precedence, a bareword can never begin with
a digit. An integer literal must be followed by a token boundary:
whitespace, end of input, a delimiter ([ ] ( ) { } < >),
a separator (;), or a comment (#). Any other character immediately
after the digits is a compile-time error — for example 3rd, 5x,
34$foo, 0_bar, and 5-3 are all illegal. A sign is part of an integer
only when a digit follows it — a lone -/+, or a sign before a non-digit
(-foo), is an ordinary bareword.
Boolean literals
The barewords true and false are reserved: they lex as the bool token
rather than word, and denote the two boolean constants. Being reserved at
the lexical level, they cannot be used as identifiers (variable, parameter,
or def/const names). Quoting escapes the reservation — 'true' and
"true" are the ordinary string true (Constants).
Barewords
A bareword is a maximal run of word characters that does not begin an integer literal — integer literals take precedence, so a bareword never starts with a digit (and never with a sign immediately followed by one). A character is a word character iff it is an ASCII letter or digit or one of:
_ - . ! ? * + / % = | , :
The following characters are not word characters; each begins another
token or is otherwise special: < > & ( ) { } [ ] $ " ' ; #, backslash,
and whitespace. The backtick ` is reserved; it is also not a word character.
WordChar = letter | digit | "_" | "-" | "." | "!" | "?" | "*" | "+"
| "/" | "%" | "=" | "|" | "," | ":" .
bareword = WordChar { WordChar } . # but does not begin an int literal
The following forms are recognized as special-case word tokens:
\;is awordtoken whose text is a single semicolon, so a literal;can appear where a separator would otherwise be seen.- A backslash not acting as a string escape or line continuation is a
wordtoken containing the backslash; the following character is treated normally. />is a singlewordtoken — the threading operator (Expressions).
Statement separators
An unescaped newline and an unescaped semicolon are equivalent: both
produce a newline token. They separate statements within a body.
do 'wave'; do 'smile' # two statements on one line
do 'wave'
do 'smile' # the same, separated by newlines
A backslash immediately before a line terminator is a line continuation (both removed; see Source code representation).
Comments
# introduces a comment that runs to the end of the line
(Source code representation — Comments).
String literals
There are two string forms. Double-quoted strings interpolate; single-quoted strings are literal.
Double-quoted (interpolated) strings
"..." performs substitution: the literal text between the quotes is
interleaved with the variable and command substitutions described below.
The recognized escapes are:
| Escape | Meaning |
|---|---|
\\ |
backslash |
\" |
double quote |
\$ |
literal $ (suppresses variable subst.) |
\[ |
literal [ (suppresses command subst.) |
\n |
line feed |
\t |
tab |
Any other \X is kept literally as backslash and X. Two substitution
forms are recognized inside the double quotes:
- Variable substitution.
$nameor${name}. In bare$nameform the name continues only while characters are alphanumeric or_— this is narrower than a top-level bareword, so"$x-1"interpolatesxand then appends the literal-1. The braced form${name}allows any name up to the closing}. - Command substitution.
[cmd args]. The bracketed text is itself script source, recognized recursively, and spliced into the surrounding string (Expressions).
"say hello [name $actor]"
"you have ${count} coins"
"a literal \$ and \[ stay put"
A double-quoted string with no substitutions is equivalent to the plain
'...' literal with the same characters
(Expressions). An unterminated "..." is a
compile-time error.
Single-quoted (literal) strings
'...' is a flat literal with no substitution. It produces a single
string token. The recognized escape sequences are:
| Escape | Meaning |
|---|---|
\\ |
backslash |
\' |
single quote |
\n |
line feed |
\t |
tab |
Any other \X sequence is kept literally as the two characters backslash
and X — the backslash is not consumed. In particular, $ and [
inside '...' are ordinary characters; no escape is needed to suppress
them. A raw newline inside the quotes is part of the string. An
unterminated '...' is a compile-time error.
'a plain string'
'don\'t and a tab\t'
'C:\path' # backslash kept literally: C:\path
'price is $5 [really]' # $ and [ are literal here
Variable and command sigils
VarRef = "$" word | "$" "{" word "}" .
CommandRef = "&" word .
$introduces a variable reference. The name that follows must be aword, so$5and$trueare not variable references (an integer or boolean literal cannot name a variable). In the${...}form the braces enclose the name, which is taken as a singleword, so a${...}variable reference is never confused with a{block opener. A bare{not preceded by$is always a block open.&introduces a command reference and must be followed by a name;&at end of input or before whitespace is a compile-time error.&namedereferences the command namespace (Expressions).
Brackets, braces, parens, angles
[ ] { } ( ) < > are delimiter tokens. A {...} region
is ordinary source — not opaque text — and brace nesting is matched
structurally.
Reserved literals; no reserved command words
The only tokens reserved at the lexical level are the literals: an
integer literal and the boolean words true/false lex as int/bool
rather than word, so they cannot be used as identifiers — variable,
parameter, or def/const names. (Quoting escapes the reservation, but a
quoted form is a string, not a name.)
Command and control-flow words are not reserved. Names such as def,
const, if, each, before, or, elif, else are ordinary
barewords; their meaning is determined by position in the source
(see Declarations and scope
and Expressions). Restrictions on which names may
be used in def declarations are specified in
Declarations and scope.