Associative Arrays in REBOL - Hash!

Developers sometimes wonder the best way of doing associative arrays in REBOL. My answer is, use the HASH! datatype. It's not well documented, and it was somewhat buggy in earlier versions of REBOL, but that's been fixed in recent versions.

As a quick review for readers who are not programmers, an associative array (or sometimes called a map) lets you index into an array by using a value, rather than an integer index.

For instance, you might have a contact database as a block with 10000 names that have the general form:

db: [
    "Bob" [name "Bob Soe" street "...." city "..." phone "..."]
    "Sam" [name "Sam Doe" street "...." city "..." phone "..."]
    "Eli" [name "Eli Noe" street "...." city "..." phone "..."]
]

and you want to find the data for "Sam" quickly:

data: select db "sam"

This is not a tutorial, that's far beyond what I want to write in a blog, but suffice it to say, the key (the index value) can be any REBOL datatype, including file names, URLs, email addresses, etc.

If you're only going to search the above contact database a few times, you might as well leave it as a block. Why? Because to convert it into an association does have a price (the price is all the internal hashing that must be computed). This is an important fact to remember, and this factor is often ignored. So, if you are using the data in something like CGI for a single lookup, don't hash it.

However, if you are going to search the data many times, for performance you will want to convert it to a hash:

dbh: to-hash db

For those of you who are into DB ideas, you can also generate other hashes of the same records (other "keys"), perhaps you want fast lookup of phone numbers. Just generate another DB block that contains the key values, or you can even use the same HASH with multiple keys, as long as they turn out to be uniquely mapped. Instead of using SELECT, you would use FIND to get their records.

What's the difference in performance? This example will show that. It generates a block that contains 10000 records with string keys like "user1", "user2", etc. and random record contents.

db: make block! 10000
new-record: func [n] [
    reduce [
        join "user" n reduce [
            random 100000 random "abcdefghijklmn" random 24:00
        ]
    ]
]
repeat n 10000 [append db new-record n]

No, let's say you want to access it 1000 times. We'll just use "user9999" because that's almost the worst case:

t1: now/precise
loop 1000 [select db "user9999"]
t2: now/precise
print difference t2 t1

But, if you convert it to a hash first:

hash: to-hash db
t1: now/precise
loop 1000 [select hash "user9999"]
t2: now/precise
print difference t2 t1

On my computer, the second method runs about 650 times faster than the first. So now you see the advantage.

It also turns out that HASH! series also obey most of the other series operations that you use in REBOL, such as INSERT, CHANGE, REMOVE, APPEND, FIND, etc. You can also traverse them with NEXT, BACK, SKIP, and much more.

Post Comments

Updated 6-Mar-2024 - Copyright Carl Sassenrath - WWW.REBOL.COM - Edit - Blogger Source Code