|REBOL 3 Docs||Guide||Concepts||Functions||Datatypes||Errors|
|TOC < Back Next >||Updated: 17-Feb-2009 Edit History|
Normally, you parse a string to produce some result. You want to do more than just verify that the string is valid, you want to do something as it is parsed. For instance, you may want to pick out substrings from various parts of the string, create blocks of related values, or compute a value.
The examples in previous chapters showed how to parse strings, but no results were produced. This is only done to verify that a string has the specified grammar; the value returned from parse indicates its success. The following examples show this:
probe parse "a b c" ["a" "b" "c"] true
probe parse "a b" ["a" "c"] false
The parse function returns true only if it reaches the end of the input string. An unsuccessful match stops the parse of the series. If parse runs out of values to search for before reaching the end of the series, it does not traverse the series and returns false:
probe parse "a b c d" ["a" "b" "c"] false
probe parse "a b c d" [to "b" thru "d"] true
probe parse "a b c d" [to "b" to end] true
Within a rule, you can include a REBOL expression to be evaluated when parse reaches that point in the rule. Parentheses are used to indicate such expressions:
string: "there is a phone in this sentence" probe parse string [ to "a" to "phone" (print "found phone") to end ] found phone true
The example above parses the string a phone and prints the message found phone after the match is complete. If the strings a or phone are missing and the parse can not be done, the expression is not evaluated.
Expressions can appear anywhere within a rule, and multiple expressions can occur in different parts of a rule. For instance, the following code prints different strings depending on what inputs were found:
parse string [ "a" | "the" to "phone" (print "answer") | to "radio" (print "listen") | to "tv" (print "watch") ] answer
string: "there is the radio on the shelf" parse string [ "a" | "the" to "phone" (print "answer") | to "radio" (print "listen") | to "tv" (print "watch") ] listen
Here is an example that counts the number of times the HTML pre-format tag appears in a text string:
count: 0 page: read http://www.rebol.com/docs/dictionary.html parse page [any [thru <pre> (count: count + 1)]] print count 777
The most common action done with parse is to pick up parts of the string being parsed. This is done with copy, and it is followed by the name of a variable to which you want to copy the string. The following example parses the title of a web page:
parse page [thru <title> copy text to </title>] print text REBOL/Core Dictionary
The example works by skipping over text until it finds the <title> tag. That's where it starts making a copy of the input stream and setting a variable called text to hold it. The copy operation continues until the closing <title> tag is found.
The copy action also can be used with entire rule blocks. For instance, for the rule:
[copy heading ["H" ["1" | "2" | "3"]]
the heading string contains the entire H1, H2, or H3 string. This also works for large multi-block rules.
The copy action makes a copy of the substring that it finds, but that is not always desirable. In some cases, it is better to save the current position of the input stream in a variable.
In the following example, the begin variable holds a reference to the page input string just after <title>. The ending refers to the page string just before >/title<. These variables can be used in the same way as they would be used with any other series.
parse page [ thru <title> begin: to </title> ending: (change/part begin "Word Reference Guide" ending) ]
You can see the above parse expression actually changed the contents of the title:
parse page [thru <title> copy text to </title>] print text Word Reference Guide
Here is another example that marks the position of every table tag in an HTML file:
page: read http://www.rebol.com/index.html tables: make block! 20 parse page [ any [to "<table" mark: thru ">" (append tables index? mark) ] ]
The tables block now contains the position of every tag:
foreach table tables [ print ["table found at index:" table] ] table found at index: 836 table found at index: 2076 table found at index: 3747 table found at index: 3815 table found at index: 4027 table found at index: 4415 table found at index: 6050 table found at index: 6556 table found at index: 7229 table found at index: 8268
NOTE: The current position in the input string can also be modified. The next section explains how this is done.
Now that you know how to obtain the position of the input series, you also can use other series functions on it, including insert, remove, and change. To write a script that replaces all question marks (?) with exclamation marks (!), write:
str: "Where is the turkey? Have you seen the turkey?" parse str [some [to "?" mark: (change mark "!") skip]] print str Where is the turkey! Have you seen the turkey!
The skip at the tail advances the input over the new character, which is not necessary in this case, but it is a good practice.
As another example, to insert the current time everywhere the word time! appears in some text, write:
str: "at this time, I'd like to see the time change" parse str [ some [to "time" mark: (remove/part mark 4 mark: insert mark now/time) :mark ] ] print str at this 14:42:12, I'd like to see the 14:42:12 change
Notice the :mark word used above. It sets the input to a new position. The insert function returns the new position just past the insert of the current time. The set-word :mark is used to set the input to that position.
When parsing large grammar from a set of rules, variables are used to make the grammar more readable. However, the variables are global and may become confused with other variables that have the same name somewhere else in the program.
The solution to this problem is to use an object to make all the rule words local to a context. For instance:
tag-parser: make object! [ tags: make block! 100 text: make string! 8000 html-code: [ copy tag ["<" thru ">"] (append tags tag) | copy txt to "<" (append text txt) ] parse-tags: func [site [url!]] [ clear tags clear text parse read site [to "<" some html-code] foreach tag tags [print tag] print text ] ] tag-parser/parse-tags http://www.rebol.com
As rules are written, there are times debugging is needed. Specifically, you may want to know how far you got in the parsing of a rule.
The trace function can be used to watch the parse operation progress, but this can output thousands of lines that are difficult to review.
A better way is to insert debugging expressions into the parse rules. As an example, to debug the rule:
[to "<IMG" "SRC" "=" filename ">"]
insert a the print function after key sections to monitor your progress through the rule:
[to "<IMG" (print 1) "SRC" "=" (print 2) filename (print 3) ">"]
This example prints 1, 2, and 3 as the rule is processed.
Another approach is to print out part of the input string as the parse happens:
[ to "<IMG" here: (print here) "SRC" "=" here: (print here) filename here: (print here) ">" ]
If this is done often, you can create a rule for it:
here: [where: (print where)] [ to "<IMG" here "SRC" "=" here filename here ">" ]
The copy function can also be used to indicate what substrings were parsed as the rule was handled.
|TOC < Back Next >||REBOL.com - WIP Wiki||Feedback Admin|