An Interesting REBOL Non-bytecode Virtual Machine (VM)
For me, vacation is about more than just relaxing in the mountains or along the beach. Since I was a youngster, the real fun of vacation was that it offered long days to relax and dream about new ideas and inventions. (I know that may sound pretty nerdy, but I think anyone who enjoys moments of non-pressured creativity knows what I mean.)
Back when I was a boy of 10, vacations were more fun if I could get creative and design interesting electronic circuits, usually for radio related applications. These days, vacation gives me a chance to think freely outside the box on subjects like the REBOL language. When I say "outside the box" I really mean "outside the priorities". After all, vacation is really more of a time to ignore priorities and have some fun.
This year during our trip to the mountains of western Canada, I found myself dreaming about bytecode interpreters and virtual machines (VMs) and wondering if they could be applied to REBOL in some beneficial manner. In fact, I've thought about it off and on since I wrote the first REBOL prototype.
Use a Bytecode Interpreter?
Bytecode interpreters (BCI) are an old love of mine. Over the last 25 years I've implemented at least half a dozen BCI designs. Long before REBOL, and even long before Java made them more popular, BCIs were used for languages like Lisp, Smalltalk, and Pascal, and I've built interpreters for all of those.
So, just for fun, I decided to design a little BCI VM for REBOL. I came up with a small register-based VM (because they tend to be faster running than stack-based VM's like Java). For source translation, rather than considering the nearly-impossible task of REBOL compilation, it made more sense to create a dialected sub-language of REBOL. Such an approach allows the symbols of the language to have a fixed meaning within a specific context, making it much easier to generate code. You might think of it like REBOL assembly language.
When I turned to the task of creating the byte-codes, it occurred to me that bytecode is really an archaic idea. It was a useful concept back in the 1980's when processors were byte-oriented and memory was at a premium. But, these days, the advantage of bytecode is lost by the fact that most processors are optimized for words, and you might as well just encode the entire word with the attribute and argument fields you need. Of course, the speed of both approaches (byte or word) boils down to the speed of the CPU's bit shift operations and how well the CPU handles implied clearing or sign extending the results of those operations (to reduce the need to mask the results before indexing).
So, it did not take long to begin to question the BCI approach for REBOL, and since this was vacation dream-time, did it really even make sense to translate to a byte-coded format? Bytecode and word-code are nice because they take less memory than native REBOL loaded code, but how much is that really a problem? It's not usually the code that takes memory, it's the data and objects created by the code. And, when it comes to transferring program code over the net, compressed REBOL source is about as efficient as you can get.
Some of you may see where I'm going with this, so I'm going to skip a few steps and cut to the main question: Why not directly execute the dialected REBOL-code rather than go the bytecode route?
So now it gets interesting. The idea comes down to this: create a new datatype called rebcode! that uses a highly optimized evaluation process. Why not? After all, REBOL is all about creating dialects and every one of those uses an interpreter of some form or another. (Even though you may not notice it!)
Of course, the cost of rebcode is that you can no longer write complex expressions. It is more like writing assembly code. But, you could always write a little compiler (using parse) that produces rebcode output for expressing more complex operations.
To determine the speed benefit of rebcode, I decided to write a partial implementation of the VM and try some rough performance benchmarks. (I won't go into any details, because this article is too long already.)
For a very simple benchmark I wrote a simple loop. Here's how it's written in the new rebcode dialect:
; Simple Rebcode loop: count: 5000000 num: 0 num: add num 1 loop count -6
This code looks a lot like regular functional REBOL, but it is not. The code is evaluated with techniques similar to bytecode interpretation. The functions in the code are not functions at all, they are more like processor instructions. Here the loop operation is implemented as an increment and branch if greater than zero. (Please ignore the fact that I used branch offsets. I did not want to take time adding label load-time computations. For my benchmark, it's equivalent anyway.)
For the equivalent REBOL code, you might write:
count: 5000000 num: 0 until [ num: add num 1 zero? count: count - 1 ]
But, that's not really identical because until requires a block and a extra comparison test. You might be more fair to write it like this (because this is actually how you would normally write it):
num: 0 loop 5000000 [ num: add num 1 ]
For a more realistic benchmark I decided to fill a 256 character string with a unique value for each character. This is more like code that might be used for generating a graphic image. Here the string gets set to [0 1 2 3 4 5 6...] up to 255. That's the kind of operation that could be expensive in normal REBOL code because there is no function that does that directly.
The rebcode below fills the string 10000 times:
; A String Fill x 10000: count: 10000 num: 256 val: 0 t: head text ; Fill the string with the counter: change t val t: next t val: add val 1 loop num -17 loop count -27
Here is the similar code written in REBOL:
count: 0 loop 10000 [ count: add count 1 val: 0 t: head text num: 0 loop 256 [ num: add num 1 change t val t: next t val: add val 1 ] ]
Of course, again, the code is not identical. In fact, it's not easy to measure apples-to-apples due to the fact that REBOL uses functional loop operations. By the way, I did not use the obvious repeat function due to its block binding feature added to recent versions of REBOL (needed for proper recursive usage). That would have slowed down the inner REBOL loop even more.
So, how did rebcode do? Here are the results on a 2.8GHz Pentium:
The second fill string adds a bounds check to the change operation. This is the more likely case for a true implementation, because we would not want to allow rebcode to be able to read and write anywhere in memory. That would present a security problem for downloaded scripts.
The benchmarks above show that rebcode ran significantly faster than the similar REBOL code. For some types of algorithms, rebcode might be useful. I have to admit that I was a bit disappointed the speedup was not larger (like 100 times), but I suppose that's also a good thing, because I've always found REBOL to be quite fast for most of the programs I develop.
The main question is whether we should consider adding rebcode to REBOL as a regular feature? For number crunching and image processing it might be useful to be able to optimize performance sensitive parts of your code. (And, it may also open the door to possible just-in-time (JIT) machine code translation.) The down side of rebcode is the extra time it takes to write code when compared with normal REBOL. Just like when writing in C or assembly code, it's easy to make mistakes that create infinite loops or CPU exception crashes.
So, a rebcode virtual machine might be something to consider for future releases of REBOL. I wanted to tell you what I was thinking and show you the benchmarks. Please let me know what you think. It was a lot of fun to test the idea. After all, vacation is more than just hiking around the mountains or relaxing along the beach. Right?