RebCode Virtual Machine Developer Guide

By Carl Sassenrath
Revised: 1-Feb-2010
Original: 23-Oct-2005

Rebcode is used to high-performance lower-level functions.

Overview

Rebcode is a new REBOL dialect that implements a virtual machine (VM) allowing programmers to create high performance lower-level functions in a manner that is consistent with the design principles of REBOL. Rebcode functions allow you to write special purpose, highly optimized functions that are capable of running on average 10 times faster than normal REBOL.

The Problem

For most types of programs and especially scripts, REBOL's normal execution methods provide excellent performance, and all of the code can be written with higher level functions. However, there are some cases, such as looping mathematical computations and large series manipulation (e.g. generating images), that require greater performance for very specific types of operations. Those cases can benefit from a higher performance execution method, even if the code may be more difficult to write and maintain than normal REBOL.

The Solution

For cases where high performance computation is necessary, REBOL provides a different method of evaluation based on the concept of a virtual machine (VM). It is called rebcode.

Rebcode is a dialect of REBOL (block of words and values) that is executed by a virtual machine using a concept similar to that of bytecode used in languages like Java, and long before that, Pascal and Lisp. The result is that specific functions can be written in and optimized manner to execute ten times faster on average and up to thirty times faster in special cases. Cases where rebcode is useful include special graphics routines, math operations, unique search methods, and more.

The concept of rebcode is consistent with the design principles of REBOL. Rebcode is expressed in block format and is encapsulated by a function interface. Rebcode allows access to normal REBOL variables, including those bound to other contexts including objects and globals. In addition, rebcode is machine independent and will run identically on all processors.

Feature Summary

High speed execution, on average times faster, for specific integer, decimal, logic, series, and looping algorithms.
Well integrated with REBOL as a new functional datatype (rebcode!).
Access to normal REBOL variables with proper scoping (binding as locals, functions, objects, globals).
Built from normal REBOL blocks, allowing loading and molding, as well as dynamic construction of rebcode dialect code using parse, compose, reduce, and other techniques.
Supports embedded function comments similar to other REBOL functions and compatible with the help function.
Opcodes for executing integer, decimal (floating point), and series (strings, blocks, images, etc.) datatypes.
Direct execution of standard math functions such as sine, cosine, tangent, log, and others.
Able to evaluate any arbitrary REBOL expression within a block (using the do opcode).
Provides block-based control flow for conditional (if, either) and looping (while, until, loop, repeat) functions.
Assembler fixup of branch labels and support for special rewriting rules.

Special Notes

Rebcode is designed for experts and is not intended for beginners. Compared to normal REBOL, it is easy to make mistakes when writing rebcode.
Rebcode has been added to REBOL for the primary reason of providing greater performance for algorithms that require such. Because of that rebcode opcodes are are lower-level, and they remove most of the special type checks and datatype polymorphism found in normal REBOL.
Because the readability of rebcode is much less than normal REBOL, programmers should not create rebcode functions until they know for certain what that optimization is needed and that it cannot be done with normal REBOL functions, even with the special use of refinements. For example, if you decide to write a string search or parser using rebcode, you should first exhaust the wide range of solutions provided by the parse function and its dialect.
High level code, such as the outer blocks and functions of your program should never be written in rebcode. Doing so provides no advantage. Use rebcode only for optimizing specific well defined lower-level functions.
In some cases, it is better to generate rebcode using higher level expressions that are compiled. You can use the internal rebcode rewriting assembler or create your own special dialects that are compiled with the parse function.

Creating and Using Rebcode

What is Rebcode?

Rebcode is a dialect of REBOL. It is a sequence of words and values that is interpreted in an extremely efficient manner, similar to how instructions are executed on an actual processor. (This is the reason why rebcode is said to be processed by a "virtual machine".)

Rebcode is written in a block, just like normal REBOL code. However, unlike REBOL code that gets evaluated as a series of functions, rebcode consists of a sequence of opcodes. Each opcode performs a small, low-level action on a fixed set of arguments. For example, an opcode may do nothing more than add two integers. If you are familiar with the concept of assembly code, rebcode is very similar.

Here is an example rebcode block:

[
    div val 2
    add val num
    mul val 100
    div val 50
]

The words div, add, mul, and div are opcodes; they are low level functions that perform integer math operations. Each opcode is followed by two arguments that are used in the operation, and the result is stored back into the first argument (val). (Thus, rebcode is implemented in what is know as the "two-address" model of computation. This is technique permits optimal performance within the rebcode virtual machine.)

There are many different opcodes supported by the rebcode virtual machine. A summary list is provided below, and a complete description of each opcode is provided in the RebCode Reference documentation.

Rebcode Functions

All rebcode must be written within a special type of function, called a rebcode function. These functions are created in a similar way to REBOL user-defined functions.

Here is a simple example that makes a rebcode function:

md32: rebcode [
    "Returns two integers multiplied then divided by 32."
    a [integer!]
    b [integer!]
][
    mul a b
    div a 32
    return a
]

The rebcode function works just like the func function. It accepts an interface specification and a function body block and returns a rebcode! datatype. Like func, the rebcode function is shorthand for:

make rebcode! args body

The args block is the interface specification. I can contain embedded comments, formal argument words, and optional datatype restrictions.

The body block holds opcodes and their arguments. They must be written in specific order as defined by the rebcode dialect. In addition, before the rebcode block can be executed, it must be processed by an assembler that performs actions such as doing relative branch fixups and more. See below for details.

Once defined, a rebcode function can be evaluated like all other REBOL functions. You can evaluate and set its result to a variable:

result: md32 5 30

or pass the result to other functions:

print md32 3 60

or use it along with other functions:

print md32 random 10 now/month

Just keep in mind that rebcode is not a normal function. Evaluating it requires a special interpreter that is very fast, but also very strict about the format of the rebcode block.

Rebcode Format

The general form of rebcode is:

opcode argument1 argument2

A rebcode block contains one or more such opcodes:

add val 10
mul val 100
randz val

As with all REBOL dialects, line breaks are not relevant to the meaning of the code.

Some opcodes may take only one argument, and others may require more. Nearly all opcodes require that the first argument is a REBOL variable. It can be a function local, global, or object variable. Most opcodes allow the second argument to be a variable or a literal (integer, decimal, string, file, etc.)

There are three main general types of rebcode opcodes:

Compute	performs an operation between the arguments and stores the result into the first argument.
Compare	performs a comparison operation between the arguments, and sets or clears the T flag.
Control	conditionally performs an operation depending on the state of the T flag.

The compare opcodes perform an operation and store the result into the variable that is provided by the first argument. Here are examples:

set count 0
add count 1
sqrt num
length? len string

The count, num, and len variables are all modified by these compute-type opcodes to hold the result.

The compare opcodes perform an operation but do not store the result into a variable. Instead they affect an internal flag called the T flag. If the comparison is true the T flag is set true, otherwise it is false. The T flag is then tested by a number of other opcodes that can act on it. Here are examples of the opcodes:

eq count 100
gted num 5.8
head? ages

To be useful these opcodes must be followed by a control opcode that checks the T flag. Here are examples that show the typical combinations of compare and control opcodes:

geq count 100
ift [seti count 0]

gted num 5.8
brat reset-num

until [
    pick num ages 1
    add total num
    tail? ages
]

Note that the other opcodes can appear between the comparison and the control opcode, as long as they do not affect the T flag. For example:

geq count 100
add count 1
ift [seti count 0]

Note that an accurate list of rebcode opcodes and their interface specifications can be obtained directly from REBOL with the following line:

print system/internal/rebcodes

Variable Usage

Part of the power of rebcode is that normal REBOL variables can be used directly. Variables obey the same binding (scoping) rules as they do throughout REBOL. The variable can be local to the function, part of another function or object, or global. However, because rebcode is highly optimized, there are a few important rules about variables that you should know.

Rule 1: Always initialize your local variables. When you define a local variable, it is set to none by default. You must set it to the correct datatype before using it.

In this code:

code: rebcode [arg /local sum] [
    set sum 0
    add sum arg
    ...
]

the sum variable is set to an integer value before it is used with the add opcode - an integer operation. If you forget this step, then the variable will become a corrupt datatype. It may act like an integer in rebcode, but it will print as none.

Rule 2: Force variables to be of the correct datatype in the function interface. The code above is better written like this:

code: rebcode [arg [integer!] /local sum] [
    set sum 0
    add sum arg
    ...
]

Now the arg variable is guaranteed to be an integer when it is used with the add opcode.

Rule 3: Beware of opcodes that may modify the datatype of a variable. Some opcodes will set a variable's datatype as well as its value. The general rule is this: if an opcode provides an argument that is only for holding the result (not for passing values to the opcode), then it's datatype will be set according to the results of the opcode.

Here is an example:

block: [123 "name" 1.2]

code: rebcode [arg series /local sum] [
    set sum 0
    add sum arg
    ...
    pick arg block 2
]

The pick opcode will modify the arg variable to make it a string datatype. It is no longer an integer, and should not be used as such. This type of reuse of variables is common for larger functions because it reduces the number of local variables that are needed within the function.

Rule 4: For high performance code, use opcode that are datatype specific. For example:

code: rebcode [arg1 [integer!] arg2 [decimal!]] [
    ...
    seti arg1 2
    add arg1 count
    ...
    setd arg2 1.0
    addd arg2 value
]

Here the seti, add, setd, and addd opcodes only modify the values of their variables. They do not set the datatype. You should only use these opcodes when you know that the datatype is correct. This rule applies to many of opcodes within rebcode.

Returning Results

Rebcode functions do not return a result by default. This behavior is different from normal REBOL functions. In rebcode a value can only be returned with the return opcode.

Here is an example of a normal REBOL function. The result is returned automatically:

f1: func [a b] [
    max a b
]

However, written in rebcode, the result must be returned explicitly:

f2: rebcode [a b] [
    max a b
    return a
]

To exit a function without returning a result, use the exit opcode. This is the same as normal REBOL functions.

Nested Rebcode Blocks

One way that rebcode differs from most other virtual machine designs is that often uses nest blocks to implement flow-of-control opcodes.

For example, the ift and iff opcodes conditionally execute a block of rebcode depending on the state of the T flag (described earlier).

add a 10
gte a 100
ift [set a 0]

Other functions such as while and until use the same method:

set a 0
while [lt a 100] [
    pick n vals 1
    add a n
]

Note that the binding of rebcode remains the same in these types of control blocks. The opcodes are bound to the VM context. There is currently only one exception to this rule, the do opcode. See below for more. =note Branches Always Relative

All of the branch opcodes (bra, brat, etc.) expect their target labels to be within the same block of code. The branches are always relative to the current block. You cannot use a branch opcode to a target label outside the same block. Doing so will produce erroneous results. =/note

Using APPLY to Call Functions

Rebcode provides the apply opcode to call other functions. These include rebcode, natives, and user-defined functions. Note that you cannot evaluate action or op functions at this time.

The format of the apply opcode is:

apply result function [args]

The result holds the value returned from the function. The function is the name of the function, and the args block holds the values that are passed as arguments to the function.

Note that the args block must be fully reduced to a block of values and/or variable words. It must not contain opcode expressions.

Here is an example of rebcode that calls the checksum native:

apply num checksum [string]
mul num 10
...

Rebcode functions can be called in the same way. For instance, if you define the function:

add-mul: rebcode [a b c] [
    add a b
    mul a c
    return a
]

It can be called from rebcode with with a line such as:

apply num add-mul [n m 10]

If a function allows refinements, they can also be specified in the argument block. The position of the refinement arguments is that specified by the function interface. For example, if you ask for help on checksum, you see:

>> ? checksum
USAGE:
    CHECKSUM data /tcp /secure /hash size /method word /key key-value

To invoke the checksum function with the /secure refinement, you would write:

apply num checksum [string none true]

This specifies the /secure refinement as being enabled, but not /tcp refinement.

Note that all unsupplied arguments are set to none. In the examples above, the /hash, size, /method, word, and all other arguments will be set to none when the checksum function is called. =note Beware Refinement Order

In normal REBOL code, you are allowed to change the position of refinements within the interface specification of functions. In normal REBOL, this will have no affect on the functions when they are called.

For instance, this function:

bub: func [val /normal /only] [...]

can be changed to:

bub: func [val /only /normal] [...]

Without affecting normal REBOL code. However, it will have an affect on rebcode that calls it, because the order of refinements in rebcode is specified by position, not by name. =/note

Branches and Labels

Although the higher level control opcodes like ift, either, while, until, loop, repeat and others are easier to write, normally more readable, and often faster, there may be times when code can be optimized with the use of branch opcodes.

Four branch opcodes are supported:

bra	branch, unconditional branch
brat	branch if T flag is true
braf	branch if T flag is false
braw	branch using a word's value as the offset

The argument to the branch opcodes is an integer number. Branches are always relative to the location of the branch opcode, not absolute. Positive branches forward and negative branches backward. The branch must always fall within the current code block.

To make it easier to write branch offsets, labels are allowed. A label is created with the label opcode. If the bra, brat, or braf opcodes are followed by a word, the word is assumed to be a label, and the assembler will compute the correct relative offset.

Here is an example of branches and labels:

label top
add n 1
gt n 100
brat done
...
bra top
label done

Notice that the label word is also an opcode; however, it performs no operation. The label is kept in the code block to allow accurate reflection (molding) of the block.

The braw opcode allows computed branch offsets to be used. The argument to the opcode is assumed to be a word (variable) whose value is an integer offset.

In this example:

mul n 2
braw n
bra lab1
bra lab2
bra lab3

a branch is made to a branch table.

The DO Opcode

The do opcode is used to invoke the normal REBOL interpreter. This allows your rebcode to evaluate any REBOL expression from within your rebcode function. This is a useful "escape" when your code needs to perform more complicated actions or access functions or objects that are not easy to use directly in rebcode.

do plat [reduce [system/version/4 system/version/3]]

Note that do opcode does not bind the contents of its block to the VM context. This allows you to use normal REBOL code within the block.

The Rebcode! Datatype

As described above, rebcode functions are created with rebcode function such as:

md32: rebcode [
    "Returns two integers multiplied then divided by 100."
    a [integer!]
    b [integer!]
][
    mul a b
    div a 100
    return a
]

The value of variable md32 is of the rebcode! datatype. This is a functional datatype, the same as function!, native!, action!, and others.

The normal datatype functions apply to rebcode. For example:

print type? :md32
rebcode!

To check if a value is rebcode, you can write:

if rebcode? :md32 [...]

Rebcode also satisfies the general function check:

if any-function? :md32 [...]

Like other functions, help can provide usage information for rebcode functions:

help md32
USAGE:
    md32 a b
DESCRIPTION:
     Returns two integers multiplied then divided by 100.
     md32 is a rebcode value.
ARGUMENTS:
     a -- (Type: integer)
     b -- (Type: integer)

To obtain the context words for a rebcode function:

first :md32
[a b]

To get the body block of a rebcode function:

second :md32
[
    mul a b
    div a 100
    return a
]

Note that the body may be different than that used for the creation of the function. The changes are the result of the rebcode assembly process.

To get the function interface specification:

third :md32
[
    {Returns two integers multiplied then divided by 100.}
    a [integer!]
    b [integer!]
]

Not''' Currently the mold and source functions do not properly handle rebcode.

Embedded Comments

The comment opcode lets you embed comments into your code. They differ from normal ";" comments because they remain within the body of the code and will appear if the code is printed, molded, or saved.

For example, to add a string comment to your code:

comment "This is a comment"

You can also use a comment to temporarily remove sections of code by putting it within a block:

comment [
    add n 1
    eq n 10
    ift [set n 0]
]

Debugging Rebcode

It is more difficult to write rebcode than regular REBOL. Invalid expressions will crash the process.

To help you write and test code, a few debugging opcodes have been provided. These opcodes are similar to their related functions in REBOL and work the same way. You can insert them into parts of your code to view values during debugging.

?? variable	Print a variable name followed by its value.
probe	Print a molded value or a molded block of values.
print	Print a value or block of values.
escape	Check if escape key has been pressed.

Note also that calling any of these instructions also lets you use the escape key on your keyboard to stop rebcode evaluation.

Rebcode Assembler

The rebcode assembler is invoked each time a rebcode function is make (normally by calling the rebcode function). The main purpose of the assembler is to bind the rebcode opcodes to the proper VM context, and to fixup branch offsets to their target labels.

The assembler may also include other features in the future. The current format and operation of these features is subject to change and may be modified in future test releases.

The source code for the assembler can be viewed with:

probe get in system/internal 'assemble

More information about the rebcode assembler will be added in future updates to this documentation.

Errors in Rebcode

There are three types of errors that can occur in rebcode:

syntax	These errors occur at load time, the same way they do with all REBOL expressions. They are normally the result of improperly written REBOL values.
assembly	When you create a new rebcode function, it is parsed by the rebcode assembler (mentioned above). The opcodes and their datatypes will be verified during this operation, and if invalid, an error message will be generated.
runtime	For performance reasons very little checking is done within rebcode opcodes. This is different than most REBOL function code. It is possible for errors to exist in your rebcode that can crash the REBOL process. Such errors are permitted, and must be eliminated by the programmer. It is also possible for errors to have no effect at all and produce invalid results. =note

Programmers must check their work carefully to be sure that no errors exist in their code. When in doubt, check your code again. Unlike normal REBOL code, errors in rebcode can crash the process. =/note

Rebcode Examples

find example: with FIND, with PARSE, with REBCODE
(makes a point - use rebcode only when needed)

Rebcode Opcode Summary

This section provides a summary of all rebcode opcodes. For more information about specific opcodes, see the Rebcode Reference document.

Compute Opcodes

Integer Math

abs	Changes the operand to its absolute value.
add	Integer add; adds operand and value; result goes in operand.
div	Divides operand by value; the integral result goes in operand.
max	Sets the operand to the greater of the two values.
min	Sets the operand to the lesser of the two values.
mul	Multiplies operand by value; result goes in operand.
neg	Change sign.
randz	Zero-based random number generator; sets operand to value from 0 to (value - 1).
rem	Remainder; divides operand by value; the integral remainder goes in the operand.
sub	Subtracts value from operand; result goes in operand.

Decimal Math

absd	Changes the operand to its absolute value.
acos	Arccosine.
addd	Decimal add; adds operand and value; result goes in operand.
asin	Arcsine.
atan	Arctangent.
cos	Cosine.
divd	Divides operand by value; result goes in operand.
exp	Exponential.
log-10	Log base 10.
log-e	Natural log.
maxd	Sets the operand to the greater of the two values.
mind	Sets the operand to the lesser of the two values.
muld	Multiplies operand by value; result goes in operand.
negd	Change sign
sin	Sine.
sqrt	Square root.
subd	Subtracts value from operand; result goes in operand.
tan	Tangent.

Integer Logic

and	Bitwise AND of two integers; result goes in operand.
compl	Bitwise complement
lsl	Logical Shift left; same as multiplying by two for each bit shifted.
lsr	Logical Shift right; same as dividing by two for each bit shifted.
not	Logic datatype complement
or	Bitwise OR of two integers; result goes in operand.
xor	Bitwise XOR of two integers; result goes in operand.

Series Traversal

back	Moves the current position of the series backward by one.
head	Changes the current position of the series to its head.
next	Moves the current position of the series forward by one.
skip	Changes the current position of the series forward or backward.
Tail	Changes the current position of the series to its tail.

Series Modification

change	Changes the value at the specified position in a series.
clear	Removes all values from the current index to the tail. Leaves reference at tail.
copy	Copies items from series. Operand modified.
insert	Inserts one series into another and returns the series at the insert point.
pick	Sets the operand to refer to the value at the specified position in a series.
pickz	Zero-based pick. Operand modified.
poke	Changes the value at the specified position in a series.
pokez	Zero-based poke.
remove	Remove count items from the series, at the current position.

Series Checks

head?	Sets the TRUE flag if a series is at its head.
index?	Returns the index number of the current position in the series.
length?	Sets the operand to the length of the series from the current position.
past?	Sets the TRUE flag if the series is past its end.
tail?	Sets the TRUE flag if a series is at its tail.

Other Opcodes

apply	Apply a function to arg block. Set result.
comment	Includes a comment in the code
do	Escape to normal evaluation. Result modified.
get	Get the value of a word (indirect). Result modified.
gett	Get the TRUE flag and store in a variable. Operand modified.
set	Set a variable to any value. Operand modified.
setd	Set decimal variable only. Operand modified.
seti	Set integer variable only. Operand modified.
sett	Set the TRUE flag from contents of a variable
to-dec	Convert integer to decimal. Operand modified.
to-int	Convert decimal to integer. Operand modified.
type?	Sets the operand to the value's datatype.
value?	Set TRUE flag if the variable has a value.

Debugging Functions

??	Works like ?? in REBOL.
escape?	Check if user pressed escape key. If so, halt to console.
print	Works like print in REBOL.
probe	Works like probe in REBOL.

Compare Opcodes

Integer Comparisons

eq	Sets the TRUE flag if the values are equal.
glt	Sets the TRUE flag if: value1 < operand < value2.
glte	Sets the TRUE flag if: value1 <= operand <= value2.
gt	Sets the TRUE flag if the first value is greater than the second value.
gteq	Sets the TRUE flag if the first value is greater than or equal to the second value.
lt	Sets the TRUE flag if the first value is less than the second value.
lteq	Sets the TRUE flag if the first value is less than or equal to the second value.
neq	Sets the TRUE flag if the values are not equal.

Decimal Comparisons

eqd	Sets the TRUE flag if the values are equal.
gltd	Sets the TRUE flag if: value1 < operand < value2.
glted	Sets the TRUE flag if: value1 <= operand <= value2.
gtd	Sets the TRUE flag if the first value is greater than the second value.
gteqd	Sets the TRUE flag if the first value is greater than or equal to the second value.
ltd	Sets the TRUE flag if the first value is less than the second value.
lteqd	Sets the TRUE flag if the first value is less than or equal to the second value.
neqd	Sets the TRUE flag if the values are not equal.

Control Opcodes

Conditional

iff	If the TRUE flag is not set, evaluate the block.
ift	If the TRUE flag is set, evaluate the block.
either	If the TRUE flag is set, evaluate the first block; otherwise evaluate the second block.

Looping

break	Exit from the currently executing block
loop	Evaluate the block a specified number of times.
repeat	Evaluates a block a number of times or over a series.
repeatz	Zero-based repeat (0 to n-1)
until	Evaluate a block until the TRUE flag is set
while	While the condition block sets the TRUE flag, evaluate the body block.

Branching

bra	Unconditional branch
braf	Branch to target if the TRUE flag is not set.
brat	Branch to target if the TRUE flag is set.
braw	Branch via variable
label	Define target label

Function Return

exit	Exits a rebcode function, returning no value.
return	Return value from rebcode function.