REBOL 3 Docs Guide Concepts Functions Datatypes Errors
  TOC < Back Next >   Updated: 9-Jun-2013 Edit History  

REBOL 3 Concepts: Parsing: Summary of Parse Operations

Contents

Input datatypes

Three types of input can be provided to the parse function:

any-string!strings of any type (unicode)
any-block!blocks of any type
binary!sequence of bytes (not characters)

General parse rules

Rules consist of these main elements:

Item Description
keyword a special word of the dialect, listed in the table below
word get or set a variable (see below) - cannot be a keyword
path get or set a variable via a path (see below)
value match the input to a value (accepted datatypes depend on input datatype)
"|" backtrack and match to next alternate rule (or)
[block] a block of sub-rules
(paren) evaluate an expression (a production)

List of keywords

Within the parse dialect, these words are treated as keywords and cannot be used as variables.

Keyword Description
and rule match to the rule, but do not advance the input (allows matching multiple rules to the same input)
any rule match to the rule zero or more times; stop on failure or if input does not change.
break break out of a match loop (such as any, some, while), always indicating success.
change rule only value match the rule, and if true, change the input to the new value (can be different lengths)
copy word set the word to a copy of the input for matched rules
do rule evaluate the input as code, then attempt to match to the rule
end match end of input
fail force current rule to fail, backtrack
if (expr) evaluate the expression (in a paren) and if false or none, fail and backtrack
insert only value insert a value at the current input position (with optional ONLY for blocks by reference); input position is adjusted just past the insert
into rule match a series, then parse it with given rule; new series can be the same or different datatype.
opt rule match to the rule once or not at all (zero or one times)
not rule invert the result of the next rule
quote arg accept next argument exactly as is (exception: paren)
reject similar to break: break out of a match loop (such as any, some, while), but indicate failure.
remove rule match the rule, and if true, remove the matched input
return value match the rule, and if true, immediately return the matched input as result of the PARSE function
set word set the word to the value of the input for matched rules
skip skip input (for the count range, if provided before it)
some rule match to the rule one or more times; stop on failure or if input does not change.
then regardless of failure or success of what follows, skip the next alternate rule (branch)
thru rule scan forward in input for matching rules, advance input to tail of the match
to rule scan forward in input for matching rules, advance input to head of the match
while rule like any, match to the rule zero or more times; stop on failure; does not care if input changes or not.
?? Debugging output. Prints the next parse rule value and shows the current input position (e.g. where you are in the string.)

In addition, none is a special value that can be used as a default match rule. It is often used at the end of alternate rules to catch all no-match cases.

Words and paths as variables

If a word is not a keyword (listed above), and it is not a literal word or refinement word, then it is assumed to be a variable. This is also true for paths, such as object fields, uses as variables.

Variables can hold:

Words are not allowed to hold keywords.

Within the parse dialect, word notation has a different meaning. Here is a summary:

Usage Description
word
look-up the value of a word
word:
set the variable to the current input position
:word
set the current input series position from the variable

Match rules

Input can be compared with various values "match" rules:

Rule Type Description
literal Most of the literal values can be used as matches. The primary exceptions are integer! (used for repeat counts), and words (used as variables, unless you make them literal words), blocks and parens .
quote Any value that follows the word quote is used literally, with the exception of parens (which are evaluated and their results are used for matching). For example, this is how you can match an integer value, even though integers are normally used for counters.
block Hold collections of sub-rules for matching.
variable A variable can be used to hold a literal and has the same effect as described above.

Alternate rules

An input is parsed by attempting different rules. If a rule fails, a backtrack happens and the next alternate rule is attempted.

Symbol Description
"|" alternate rule - if rule is true up to this point, the "|" forces the end of the current rule block (no other alternatives are tested.) Otherwise, the parser skips forward from the failure point to the next alternative (marked with an "|" bar).
then branching rule - when reached (because the rule is true up to this point), the ? will cause the next alternative rule to be skipped if the rest of the rule is false (if it's true, then the full rule is true, and no alternates are needed). This implements a branch because the success or failure of the rule up to the ? will determine which alternate will be tried next.

Skipping input

You can advance the input stream without an immediate match in these ways:

skipskip a value (or multiple if repeat given)
toscan for a value or datatype, advance to the head of the value
thruscan for a value or datatype, advance to the tail of the value

For ease-of-use, the to and thru actions also allow simple alternate match values. (However, this capability is not intended as a substitute for defining proper parse rules for more complex parse cases.)

Optional or repeated rules

Rules can be optional or repeated a number of ways:

Word or value Description
opt rule match rule or not (zero or one time)
some rule repeat rule one or more times
any rule repeat rule zero or more times
3 rule repeat rule 3 times
0 3 rule repeat rule 0 to 3 times
1 3 rule repeat rule 1 to 3 times

Keywords that accept a repeat count are:

skip
to
thru
quote
into
end

See the section below about repeat loop termination due to non-advancing input matching.

Storing input

As input is parsed, you can store parts of it into variables for use later within your productions:

setset the next value to a variable
copycopy the next matched sequence to a variable

Modifying input

The input series can be modified during the parse using these commands:

removeremoves the matched input
insertinserts into the series at current input (resumes past the insertion)
changechanges the matched input to a given value (resumes past the change)

All of these will handle the input pointer appropriately. However, for remove, it is possible that the input does not advance (because it was removed.) If you are using it with a some or any loop, read the next section.

Input position must change

The parse function is about matching the input stream with given rules. In some cases, a rule may succeed, but the input position did not change. For example:

parse str [some [to "abc"]]
parse str [some ["a" | "b" | none]]

To avoid infinite looping, a special internal rule is triggered based on the fact that the rule did not change the input position.

However, this shows a problem with this rule:

parse str [some [to "a" remove thru "b"]]

Here the input did not appear to advance, but something useful happened. In such cases, the some word should not be used, and the while word is better:

parse str [while [to "a" remove thru "b"]]

Evaluating expressions (productions)

A paren! found anywhere within the flow of parsing will be evaluated. Normally this is done as part of processing the correct branch of the parse tree; however, it can also be done in preparation sections and elsewhere.

Note that for the quote command, a paren! must return a value that will be used for matching the input.


  TOC < Back Next > REBOL.com - WIP Wiki Feedback Admin