Source code representation
Script source is text. The character set is ASCII: each byte is a character, with no Unicode or multibyte decoding. Source characters fall into a small number of classes — letters, digits, whitespace, and line terminators.
Characters
The character classes are:
- Whitespace: Space (
' ') and tab ('\t') separate adjacent tokens and are otherwise insignificant. Newlines are not whitespace — they are statement separators (see below). - Line terminators: A line feed (
'\n'), a carriage return ('\r'), or a carriage return followed by a line feed ("\r\n") terminates a source line. Any of these three forms counts as a single newline, which acts as a statement separator (see below). - Letters:
A ... Zanda ... z. - Digits:
0 ... 9
Source layout
A script is free-form: statements are separated by newlines or semicolons, which are equivalent (see Lexical elements — Statement separators).
A backslash \\immediately before a line terminator is a line
continuation: the backslash and the terminator are both removed,
joining the two physical lines into one logical line. This
is the only way to spread a single statement across multiple physical
lines outside of a delimited construct (inside [...], (...),
and {...} newlines are already absorbed as ordinary separators or
whitespace).
Comments
A comment begins with # and runs to the end of the line. The # may
appear anywhere a token may begin, including after code on the same line.
The line terminator that ends a comment is retained and still acts as a
statement separator.
# a full-line comment
do "wave" # a trailing comment
# is not a bareword character, so it cannot be embedded in an
identifier; a # that is intended literally must appear inside a string
(Lexical elements — String literals).