REBOL 3 Concepts: Parsing: Dealing with Spaces

Pending Revision

This document was written for R2 and has yet to be revised for R3.

The parse function normally ignores all intervening whitespace between patterns that it scans. For instance, the rule:

["a" "b" "c"]

returns strings that match:

a bc
ab c
a b c
a  b  c

and other similarly spaced combinations.

To enforce a specific spacing convention, use parse with the /all refinement. In the preceeding example, this refinement causes parse to only match the first case (abc).

parse/all "abc" ["a" "b" "c"]

Specifying the /' all refinement forces every character in the input stream to be dealt with, including the default delimiters, such as space, tab, newline.

To handle spaces in your rules, create a character set that specifies the valid space characters:

spacer: charset reduce [tab newline #" "]

If you want a single space character between each letter write:

["a" spacer "b" spacer "c"]

To allow multiple space characters, write:

spaces: [some spacer]
["a" spaces "b" spaces "c"]

For more sophisticated grammars, create a character set that lets you scan a string up to a space character.

non-space: complement spacer
to-space: [some non-space | end]
words: make block! 20
parse/all text [
    some [copy word to-space (append words word) spacer]

The preceding example builds a block of all of its words. The complement function inverts the character set. Now it contains everything except the spacing characters you defined earlier. The non-space character set contains all characters except space characters. The to-space rule accepts one or more characters up to a space character or the end of the input stream. The main rule expects to begin with a word, copy that word up to a space, then skip the space character and begin the next word.

