REBOL 3 Docs Guide Concepts Functions Datatypes Errors
  TOC < Back Next >   Updated: 6-Feb-2009 Edit History  

REBOL 3 Concepts: Parsing: Match Types

Pending Revision

This document was written for R2 and has yet to be revised for R3.

When parsing strings, these datatypes and words can be used to match characters in the input string:

Match Type Description
'"abc" match the entire string
'#"c" match a single character
'tag match a tag string
'end match to the end of the input
'(bitset) match any specified char in the set

To use all of these words (except bitset, which is explained below) in a single rule, use:

[<B> ["excellent" | "incredible"] #"!" </B> end]

This example parses the input strings:

<B>excellent!</B>
<B>incredible!</B>

The end specifies that nothing follows in the input stream. The entire input has been parsed. It is optional depending on whether the parse function's return value is to be checked. Refer to the [bad-link:concepts/evaluation.txt] section below for more information.

The bitset! datatype deserves more explanation. Bitsets are used to specify collections of characters in an efficient manner. The charset function enables you to specify individual characters or ranges of characters. For example, the line:

digit: charset "0123456789"

defines a character set that contains digits. This allows rules like:

[3 digit "-" 3 digit "-" 4 digit]
707-467-8000

To accept any number of digits, it is common to write the rule:

digits: [some digit]

A character set can also specify ranges of characters. For instance, the digit character set could have be written as:

digit: charset [#"0" - #"9"]

Alternatively, you can combine specific characters and ranges of characters:

the-set: charset ["+-." #"0" - #"9"]

To expand on this, here is the alphanumeric set of characters:

alphanum: charset [#"0" - #"9" #"A" - #"Z" #"a" - #"z"]

Character sets can also be modified with the insert and remove functions, or combinations of sets can be created with the union and intersect functions. This line copies the digit character set and adds a dot to it:

digit-dot: insert copy digit "."

The following lines define useful character sets for parsing:

digit: charset [#"0" - #"9"]
alpha: charset [#"A" - #"Z" #"a" - #"z"]
alphanum: union alpha digit


  TOC < Back Next > REBOL.com - WIP Wiki Feedback Admin