REBOL Document

Parse-xml - Function Summary


Summary:

Parses XML code and returns a tree of blocks.

Usage:

parse-xml code

Arguments:

code - XML code to parse (must be: string)

Description:

A limited XML parser is provided with every REBOL/Core and REBOL/View. The parser will convert simple XML expressions into REBOL blocks that can be processed more easily within REBOL.


    xml: {
        <PERSON>
            <NAME>Fred</NAME>
            <AGE>24</AGE>
            <ADDRESS>
                <STREET>123 Main Street</STREET>
                <CITY>Ukiah</CITY>
                <STATE>CA</STATE>
            </ADDRESS>
        </PERSON>
    }
    data: parse-xml xml
    probe data
    [document none [["PERSON" none ["^/            " ["NAME" none ["Fre
d"]] "^/            " ["AGE" none ["24"]] "^/            " ["ADDRESS" no
ne ["^/                " ["STREET" none ["123 Main Street"]] "^/        
        " ["CITY" none ["Ukiah"]] "^/                " ["STATE" none ["C
A"]] "^/            "]] "^/        "]]]]

The XML above is semantically equivalent to writing the REBOL block:


    [
        PERSON [
            NAME "Fred"
            AGE 24
            ADDRESS [
                STREET "123 Main Street"
                CITY "Ukiah"
                STATE "CA"
            ]
        ]
    ]

Here is a small REBOL function that converts the above XML into such a REBOL block:


    to-rebol-data: func [block /local out] [
        out: copy []
        foreach [tag attr body] block [
            append out to-word tag
            foreach item body [
                either block? item [
                    append/only out to-rebol-data item
                ][
                    if not empty? trim item [append out item]
                ]
            ]
        ]
        out
    ]
    probe to-rebol-data data
    [document [PERSON [NAME "Fred"] [AGE "24"] [ADDRESS [STREET "123 Ma
in Street"] [CITY "Ukiah"] [STATE "CA"]]]]

Note that the function strips extra whitespace from the XML (using the TRIM function).

If you wish to modify or expand the XML parser for your own purposes, you can obtain its source code with these lines:


    source parse-xml
    parse-xml: func [
        "Parses XML code and returns a tree of blocks." 
        code [string!] "XML code to parse"][
        xml-language/parse-xml code]


    probe xml-language
    
    make object! [
        verbose: false
        joinset: func [cset chars][insert copy cset chars]
        diffset: func [cset chars][remove/part copy cset chars]
        error: func [msg arg][print [msg arg] halt]
        space: make bitset! #{
    0026000001000000000000000000000000000000000000000000000000000000
    }
        char: make bitset! #{
    00260000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    }
        letter: make bitset! #{
    0100000000000000FEFFFF07FEFFFF070000000000000000FFFF7FFFFFFF7F01
    }
        digit: make bitset! #{
    000000000000FF03000000000000000000000000000000000000000000000000
    }
        alpha-num: make bitset! #{
    010000000000FF03FEFFFF07FEFFFF070000000000000000FFFF7FFFFFFF7F01
    }
        name-first: make bitset! #{
    0100000000000004FEFFFF87FEFFFF070000000000000000FFFF7FFFFFFF7F01
    }
        name-chars: make bitset! #{
    010000000060FF07FEFFFF87FEFFFF070000000000000000FFFF7FFFFFFF7F01
    }
        data-chars: make bitset! #{
    00260000FFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    }
        qt1: "'"
        qt2: {"}
        data-chars-qt1: make bitset! #{
    002600007FFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    }
        data-chars-qt2: make bitset! #{
    00260000FBFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    }
        name: [name-first any name-chars]
        sp: [some space]
        sp?: [any space]
        parents: []
        new-node: func [name][
            if verbose [print ["New tag:" name]] 
            insert/only tail parents parent 
            parent: add-kid copy reduce [name none none]]
        end-node: func [name][
            while [name <> first parent] [
                if verbose [print ["unterminated tag:" first parent]] 
                if empty? parents [error "End tag error:" name] 
                pop-parent] 
            pop-parent]
        pop-parent: func [][
            parent: last parents 
            remove back tail parents]
        add-kid: func [kid][
            if none? third parent [parent/3: make block! 1] 
            insert/only tail third parent kid 
            kid]
        add-attr: func [name value][
            if none? second parent [parent/2: make block! 2] 
            insert insert tail second parent name value]
        check-version: func [version][print ["XML Version:" version]]
        document: [prolog sp? content to end]
        prolog: [sp? xml-decl? any [sp? doc-type-decls]]
        xml-decl?: ["<?xml" version-info thru "?>" | none]
        version-info: [sp "version" eq [qt1 version-num qt1 | qt2 versi
on-num qt2]]
        version-num: [copy temp some name-chars (check-version temp)]
        doc-type-decls: [cmt | "<!" thru ">" | "<?" thru "?>"]
        element: [cmt | s-tag ["/>" (pop-parent) | #">" any content e-t
ag]]
        s-tag: [#"<" tag (node: new-node tag-name) any [sp attribute] s
p?]
        e-tag: ["</" tag (end-node tag-name) sp? #">"]
        tag: [copy tag-name name]
        content: [element | copy data some data-chars (add-kid data)]
        attribute: [copy attr-name name eq attr-value (add-attr attr-na
me attr-data)]
        eq: [sp? #"=" sp?]
        attr-value: [
            [qt1 copy attr-data any data-chars-qt1 qt1] | 
            [qt2 copy attr-data any data-chars-qt2 qt2]
        ]
        cmt: ["<!--" thru "-->"]
        parse-xml: func [str][
            paroot: parent: copy reduce ['document none none] 
            parse/case/all str document 
            paroot]
    ]

Related:

build-tag - Generates a tag from a composed block.
parse - Parses a series according to rules.


<Back | Index | Next>

Copyright 2004 REBOL Technologies