REBOL 3 Docs Guide Concepts Functions Datatypes Errors
  TOC < Back Next >   Updated: 6-Feb-2009 Edit History  

REBOL 3 Concepts: Parsing: Grammar Rules

Pending Revision

This document was written for R2 and has yet to be revised for R3.

The parse function accepts grammar rules that are written in a dialect of REBOL. Dialects are sub-languages of REBOL that use the same lexical form for all datatypes, but allow a different ordering of the values within a block. Within this dialect the grammar and vocabulary of REBOL is altered to make it similar in structure to the well known BNF (Backus-Naur Form) which is commonly used to specify language grammars, network protocols, header formats, etc.

To define rules, use a block to specify the sequence of the inputs. For instance, if you want to parse a string and return the characters "the phone", you can use a rule:

parse string ["the phone"]

To allow any number of spaces or no spaces between the words, write the rule like this:

parse string ["the" "phone"]

You can indicate alternate rules with a vertical bar (|). For example:

["the" "phone" | "a" "radio"]
the phone
a radio

A rule can contain blocks that are treated as sub-rules. The following line:

[ ["a" | "the"] ["phone" | "radio"] ]
a phone
a radio
the phone
the radio

For increased readability, write the sub-rules as a separate block and give them a name to help indicate their purpose:

article: ["a" | "the"]
device: ["phone" | "radio"]
parse string [article device]

In addition to matching a single instance of a string, you can provide a count or a range that repeats the match. The following example provides a count:

[3 "a" 2 "b"]
aaabb

The next example provides a range:

[1 3 "a" "b"]
ab aab aaab

The starting point of a range can be zero, meaning that it is optional.

[0 3 "a" "b"]
b ab aab aaab

Use some to specify that one or more characters are matched. Use any to specify that zero or more characters are matched. For example, some used in the following line:

[some "a" "b"]
ab aab aaab aaaab

The next example uses any:

[any "a" "b"]
b ab aab aaab aaaab

The words some and any can also be used on blocks. For example:

[some ["a" | "b"]]

accepts strings that contain any combination of the characters a and b.

Another way to express that a character is optional is to provide an alternate choice of none:

["a" | "b" | none]

This example accepts strings that contain a or b or none!.

The none! is useful for specifying optional patterns or for catching error cases when no pattern matches.


  TOC < Back Next > REBOL.com - WIP Wiki Feedback Admin