REBOL
Docs Blog Get-it

RebCode Virtual Machine Developer Guide

By Carl Sassenrath
Revised: 1-Feb-2010
Original: 23-Oct-2005

RebCode Index Page

Rebcode is used to high-performance lower-level functions.

Contents

Overview
The Problem
The Solution
Feature Summary
Special Notes
Creating and Using Rebcode
What is Rebcode?
Rebcode Functions
Rebcode Format
Variable Usage
Returning Results
Nested Rebcode Blocks
Using APPLY to Call Functions
Branches and Labels
The DO Opcode
The Rebcode! Datatype
Embedded Comments
Debugging Rebcode
Rebcode Assembler
Errors in Rebcode
Rebcode Examples
Rebcode Opcode Summary
Compute Opcodes
Integer Math
Decimal Math
Integer Logic
Series Traversal
Series Modification
Series Checks
Other Opcodes
Debugging Functions
Compare Opcodes
Integer Comparisons
Decimal Comparisons
Control Opcodes
Conditional
Looping
Branching
Function Return

Overview

Rebcode is a new REBOL dialect that implements a virtual machine (VM) allowing programmers to create high performance lower-level functions in a manner that is consistent with the design principles of REBOL. Rebcode functions allow you to write special purpose, highly optimized functions that are capable of running on average 10 times faster than normal REBOL.

The Problem

For most types of programs and especially scripts, REBOL's normal execution methods provide excellent performance, and all of the code can be written with higher level functions. However, there are some cases, such as looping mathematical computations and large series manipulation (e.g. generating images), that require greater performance for very specific types of operations. Those cases can benefit from a higher performance execution method, even if the code may be more difficult to write and maintain than normal REBOL.

The Solution

For cases where high performance computation is necessary, REBOL provides a different method of evaluation based on the concept of a virtual machine (VM). It is called rebcode.

Rebcode is a dialect of REBOL (block of words and values) that is executed by a virtual machine using a concept similar to that of bytecode used in languages like Java, and long before that, Pascal and Lisp. The result is that specific functions can be written in and optimized manner to execute ten times faster on average and up to thirty times faster in special cases. Cases where rebcode is useful include special graphics routines, math operations, unique search methods, and more.

The concept of rebcode is consistent with the design principles of REBOL. Rebcode is expressed in block format and is encapsulated by a function interface. Rebcode allows access to normal REBOL variables, including those bound to other contexts including objects and globals. In addition, rebcode is machine independent and will run identically on all processors.

Feature Summary

Special Notes

Creating and Using Rebcode

What is Rebcode?

Rebcode is a dialect of REBOL. It is a sequence of words and values that is interpreted in an extremely efficient manner, similar to how instructions are executed on an actual processor. (This is the reason why rebcode is said to be processed by a "virtual machine".)

Rebcode is written in a block, just like normal REBOL code. However, unlike REBOL code that gets evaluated as a series of functions, rebcode consists of a sequence of opcodes. Each opcode performs a small, low-level action on a fixed set of arguments. For example, an opcode may do nothing more than add two integers. If you are familiar with the concept of assembly code, rebcode is very similar.

Here is an example rebcode block:

[
    div val 2
    add val num
    mul val 100
    div val 50
]

The words div, add, mul, and div are opcodes; they are low level functions that perform integer math operations. Each opcode is followed by two arguments that are used in the operation, and the result is stored back into the first argument (val). (Thus, rebcode is implemented in what is know as the "two-address" model of computation. This is technique permits optimal performance within the rebcode virtual machine.)

There are many different opcodes supported by the rebcode virtual machine. A summary list is provided below, and a complete description of each opcode is provided in the RebCode Reference documentation.

Rebcode Functions

All rebcode must be written within a special type of function, called a rebcode function. These functions are created in a similar way to REBOL user-defined functions.

Here is a simple example that makes a rebcode function:

md32: rebcode [
    "Returns two integers multiplied then divided by 32."
    a [integer!]
    b [integer!]
][
    mul a b
    div a 32
    return a
]

The rebcode function works just like the func function. It accepts an interface specification and a function body block and returns a rebcode! datatype. Like func, the rebcode function is shorthand for:

make rebcode! args body

The args block is the interface specification. I can contain embedded comments, formal argument words, and optional datatype restrictions.

The body block holds opcodes and their arguments. They must be written in specific order as defined by the rebcode dialect. In addition, before the rebcode block can be executed, it must be processed by an assembler that performs actions such as doing relative branch fixups and more. See below for details.

Once defined, a rebcode function can be evaluated like all other REBOL functions. You can evaluate and set its result to a variable:

result: md32 5 30

or pass the result to other functions:

print md32 3 60

or use it along with other functions:

print md32 random 10 now/month

Just keep in mind that rebcode is not a normal function. Evaluating it requires a special interpreter that is very fast, but also very strict about the format of the rebcode block.

Rebcode Format

The general form of rebcode is:

opcode argument1 argument2

A rebcode block contains one or more such opcodes:

add val 10
mul val 100
randz val

As with all REBOL dialects, line breaks are not relevant to the meaning of the code.

Some opcodes may take only one argument, and others may require more. Nearly all opcodes require that the first argument is a REBOL variable. It can be a function local, global, or object variable. Most opcodes allow the second argument to be a variable or a literal (integer, decimal, string, file, etc.)

There are three main general types of rebcode opcodes:

Computeperforms an operation between the arguments and stores the result into the first argument.
Compareperforms a comparison operation between the arguments, and sets or clears the T flag.
Controlconditionally performs an operation depending on the state of the T flag.

The compare opcodes perform an operation and store the result into the variable that is provided by the first argument. Here are examples:

set count 0
add count 1
sqrt num
length? len string

The count, num, and len variables are all modified by these compute-type opcodes to hold the result.

The compare opcodes perform an operation but do not store the result into a variable. Instead they affect an internal flag called the T flag. If the comparison is true the T flag is set true, otherwise it is false. The T flag is then tested by a number of other opcodes that can act on it. Here are examples of the opcodes:

eq count 100
gted num 5.8
head? ages

To be useful these opcodes must be followed by a control opcode that checks the T flag. Here are examples that show the typical combinations of compare and control opcodes:

geq count 100
ift [seti count 0]

gted num 5.8
brat reset-num

until [
    pick num ages 1
    add total num
    tail? ages
]

Note that the other opcodes can appear between the comparison and the control opcode, as long as they do not affect the T flag. For example:

geq count 100
add count 1
ift [seti count 0]

Note that an accurate list of rebcode opcodes and their interface specifications can be obtained directly from REBOL with the following line:

print system/internal/rebcodes

Variable Usage

Part of the power of rebcode is that normal REBOL variables can be used directly. Variables obey the same binding (scoping) rules as they do throughout REBOL. The variable can be local to the function, part of another function or object, or global. However, because rebcode is highly optimized, there are a few important rules about variables that you should know.

Rule 1: Always initialize your local variables. When you define a local variable, it is set to none by default. You must set it to the correct datatype before using it.

In this code:

code: rebcode [arg /local sum] [
    set sum 0
    add sum arg
    ...
]

the sum variable is set to an integer value before it is used with the add opcode - an integer operation. If you forget this step, then the variable will become a corrupt datatype. It may act like an integer in rebcode, but it will print as none.

Rule 2: Force variables to be of the correct datatype in the function interface. The code above is better written like this:

code: rebcode [arg [integer!] /local sum] [
    set sum 0
    add sum arg
    ...
]

Now the arg variable is guaranteed to be an integer when it is used with the add opcode.

Rule 3: Beware of opcodes that may modify the datatype of a variable. Some opcodes will set a variable's datatype as well as its value. The general rule is this: if an opcode provides an argument that is only for holding the result (not for passing values to the opcode), then it's datatype will be set according to the results of the opcode.

Here is an example:

block: [123 "name" 1.2]

code: rebcode [arg series /local sum] [
    set sum 0
    add sum arg
    ...
    pick arg block 2
]

The pick opcode will modify the arg variable to make it a string datatype. It is no longer an integer, and should not be used as such. This type of reuse of variables is common for larger functions because it reduces the number of local variables that are needed within the function.

Rule 4: For high performance code, use opcode that are datatype specific. For example:

code: rebcode [arg1 [integer!] arg2 [decimal!]] [
    ...
    seti arg1 2
    add arg1 count
    ...
    setd arg2 1.0
    addd arg2 value
]

Here the seti, add, setd, and addd opcodes only modify the values of their variables. They do not set the datatype. You should only use these opcodes when you know that the datatype is correct. This rule applies to many of opcodes within rebcode.

Returning Results

Rebcode functions do not return a result by default. This behavior is different from normal REBOL functions. In rebcode a value can only be returned with the return opcode.

Here is an example of a normal REBOL function. The result is returned automatically:

f1: func [a b] [
    max a b
]

However, written in rebcode, the result must be returned explicitly:

f2: rebcode [a b] [
    max a b
    return a
]

To exit a function without returning a result, use the exit opcode. This is the same as normal REBOL functions.

Nested Rebcode Blocks

One way that rebcode differs from most other virtual machine designs is that often uses nest blocks to implement flow-of-control opcodes.

For example, the ift and iff opcodes conditionally execute a block of rebcode depending on the state of the T flag (described earlier).

add a 10
gte a 100
ift [set a 0]

Other functions such as while and until use the same method:

set a 0
while [lt a 100] [
    pick n vals 1
    add a n
]

Note that the binding of rebcode remains the same in these types of control blocks. The opcodes are bound to the VM context. There is currently only one exception to this rule, the do opcode. See below for more. =note Branches Always Relative

All of the branch opcodes (bra, brat, etc.) expect their target labels to be within the same block of code. The branches are always relative to the current block. You cannot use a branch opcode to a target label outside the same block. Doing so will produce erroneous results. =/note

Using APPLY to Call Functions

Rebcode provides the apply opcode to call other functions. These include rebcode, natives, and user-defined functions. Note that you cannot evaluate action or op functions at this time.

The format of the apply opcode is:

apply result function [args]

The result holds the value returned from the function. The function is the name of the function, and the args block holds the values that are passed as arguments to the function.

Note that the args block must be fully reduced to a block of values and/or variable words. It must not contain opcode expressions.

Here is an example of rebcode that calls the checksum native:

apply num checksum [string]
mul num 10
...

Rebcode functions can be called in the same way. For instance, if you define the function:

add-mul: rebcode [a b c] [
    add a b
    mul a c
    return a
]

It can be called from rebcode with with a line such as:

apply num add-mul [n m 10]

If a function allows refinements, they can also be specified in the argument block. The position of the refinement arguments is that specified by the function interface. For example, if you ask for help on checksum, you see:

>> ? checksum
USAGE:
    CHECKSUM data /tcp /secure /hash size /method word /key key-value

To invoke the checksum function with the /secure refinement, you would write:

apply num checksum [string none true]

This specifies the /secure refinement as being enabled, but not /tcp refinement.

Note that all unsupplied arguments are set to none. In the examples above, the /hash, size, /method, word, and all other arguments will be set to none when the checksum function is called. =note Beware Refinement Order

In normal REBOL code, you are allowed to change the position of refinements within the interface specification of functions. In normal REBOL, this will have no affect on the functions when they are called.

For instance, this function:

bub: func [val /normal /only] [...]

can be changed to:

bub: func [val /only /normal] [...]

Without affecting normal REBOL code. However, it will have an affect on rebcode that calls it, because the order of refinements in rebcode is specified by position, not by name. =/note

Branches and Labels

Although the higher level control opcodes like ift, either, while, until, loop, repeat and others are easier to write, normally more readable, and often faster, there may be times when code can be optimized with the use of branch opcodes.

Four branch opcodes are supported:

brabranch, unconditional branch
bratbranch if T flag is true
brafbranch if T flag is false
brawbranch using a word's value as the offset

The argument to the branch opcodes is an integer number. Branches are always relative to the location of the branch opcode, not absolute. Positive branches forward and negative branches backward. The branch must always fall within the current code block.

To make it easier to write branch offsets, labels are allowed. A label is created with the label opcode. If the bra, brat, or braf opcodes are followed by a word, the word is assumed to be a label, and the assembler will compute the correct relative offset.

Here is an example of branches and labels:

label top
add n 1
gt n 100
brat done
...
bra top
label done

Notice that the label word is also an opcode; however, it performs no operation. The label is kept in the code block to allow accurate reflection (molding) of the block.

The braw opcode allows computed branch offsets to be used. The argument to the opcode is assumed to be a word (variable) whose value is an integer offset.

In this example:

mul n 2
braw n
bra lab1
bra lab2
bra lab3

a branch is made to a branch table.

The DO Opcode

The do opcode is used to invoke the normal REBOL interpreter. This allows your rebcode to evaluate any REBOL expression from within your rebcode function. This is a useful "escape" when your code needs to perform more complicated actions or access functions or objects that are not easy to use directly in rebcode.

do plat [reduce [system/version/4 system/version/3]]

Note that do opcode does not bind the contents of its block to the VM context. This allows you to use normal REBOL code within the block.

The Rebcode! Datatype

As described above, rebcode functions are created with rebcode function such as:

md32: rebcode [
    "Returns two integers multiplied then divided by 100."
    a [integer!]
    b [integer!]
][
    mul a b
    div a 100
    return a
]

The value of variable md32 is of the rebcode! datatype. This is a functional datatype, the same as function!, native!, action!, and others.

The normal datatype functions apply to rebcode. For example:

print type? :md32
rebcode!

To check if a value is rebcode, you can write:

if rebcode? :md32 [...]

Rebcode also satisfies the general function check:

if any-function? :md32 [...]

Like other functions, help can provide usage information for rebcode functions:

help md32
USAGE:
    md32 a b
DESCRIPTION:
     Returns two integers multiplied then divided by 100.
     md32 is a rebcode value.
ARGUMENTS:
     a -- (Type: integer)
     b -- (Type: integer)

To obtain the context words for a rebcode function:

first :md32
[a b]

To get the body block of a rebcode function:

second :md32
[
    mul a b
    div a 100
    return a
]

Note that the body may be different than that used for the creation of the function. The changes are the result of the rebcode assembly process.

To get the function interface specification:

third :md32
[
    {Returns two integers multiplied then divided by 100.}
    a [integer!]
    b [integer!]
]

Not''' Currently the mold and source functions do not properly handle rebcode.

Embedded Comments

The comment opcode lets you embed comments into your code. They differ from normal ";" comments because they remain within the body of the code and will appear if the code is printed, molded, or saved.

For example, to add a string comment to your code:

comment "This is a comment"

You can also use a comment to temporarily remove sections of code by putting it within a block:

comment [
    add n 1
    eq n 10
    ift [set n 0]
]

Debugging Rebcode

It is more difficult to write rebcode than regular REBOL. Invalid expressions will crash the process.

To help you write and test code, a few debugging opcodes have been provided. These opcodes are similar to their related functions in REBOL and work the same way. You can insert them into parts of your code to view values during debugging.

?? variablePrint a variable name followed by its value.
probePrint a molded value or a molded block of values.
printPrint a value or block of values.
escapeCheck if escape key has been pressed.

Note also that calling any of these instructions also lets you use the escape key on your keyboard to stop rebcode evaluation.

Rebcode Assembler

The rebcode assembler is invoked each time a rebcode function is make (normally by calling the rebcode function). The main purpose of the assembler is to bind the rebcode opcodes to the proper VM context, and to fixup branch offsets to their target labels.

The assembler may also include other features in the future. The current format and operation of these features is subject to change and may be modified in future test releases.

The source code for the assembler can be viewed with:

probe get in system/internal 'assemble

More information about the rebcode assembler will be added in future updates to this documentation.

Errors in Rebcode

There are three types of errors that can occur in rebcode:

syntaxThese errors occur at load time, the same way they do with all REBOL expressions. They are normally the result of improperly written REBOL values.
assemblyWhen you create a new rebcode function, it is parsed by the rebcode assembler (mentioned above). The opcodes and their datatypes will be verified during this operation, and if invalid, an error message will be generated.
runtimeFor performance reasons very little checking is done within rebcode opcodes. This is different than most REBOL function code. It is possible for errors to exist in your rebcode that can crash the REBOL process. Such errors are permitted, and must be eliminated by the programmer. It is also possible for errors to have no effect at all and produce invalid results. =note

Programmers must check their work carefully to be sure that no errors exist in their code. When in doubt, check your code again. Unlike normal REBOL code, errors in rebcode can crash the process. =/note

Rebcode Examples

find example: with FIND, with PARSE, with REBCODE
(makes a point - use rebcode only when needed)

Rebcode Opcode Summary

This section provides a summary of all rebcode opcodes. For more information about specific opcodes, see the Rebcode Reference document.

Compute Opcodes

Integer Math

absChanges the operand to its absolute value.
addInteger add; adds operand and value; result goes in operand.
divDivides operand by value; the integral result goes in operand.
maxSets the operand to the greater of the two values.
minSets the operand to the lesser of the two values.
mulMultiplies operand by value; result goes in operand.
negChange sign.
randzZero-based random number generator; sets operand to value from 0 to (value - 1).
remRemainder; divides operand by value; the integral remainder goes in the operand.
subSubtracts value from operand; result goes in operand.

Decimal Math

absdChanges the operand to its absolute value.
acosArccosine.
adddDecimal add; adds operand and value; result goes in operand.
asinArcsine.
atanArctangent.
cosCosine.
divdDivides operand by value; result goes in operand.
expExponential.
log-10Log base 10.
log-eNatural log.
maxdSets the operand to the greater of the two values.
mindSets the operand to the lesser of the two values.
muldMultiplies operand by value; result goes in operand.
negdChange sign
sinSine.
sqrtSquare root.
subdSubtracts value from operand; result goes in operand.
tanTangent.

Integer Logic

andBitwise AND of two integers; result goes in operand.
complBitwise complement
lslLogical Shift left; same as multiplying by two for each bit shifted.
lsrLogical Shift right; same as dividing by two for each bit shifted.
notLogic datatype complement
orBitwise OR of two integers; result goes in operand.
xorBitwise XOR of two integers; result goes in operand.

Series Traversal

backMoves the current position of the series backward by one.
headChanges the current position of the series to its head.
nextMoves the current position of the series forward by one.
skipChanges the current position of the series forward or backward.
TailChanges the current position of the series to its tail.

Series Modification

changeChanges the value at the specified position in a series.
clearRemoves all values from the current index to the tail. Leaves reference at tail.
copyCopies items from series. Operand modified.
insertInserts one series into another and returns the series at the insert point.
pickSets the operand to refer to the value at the specified position in a series.
pickzZero-based pick. Operand modified.
pokeChanges the value at the specified position in a series.
pokezZero-based poke.
removeRemove count items from the series, at the current position.

Series Checks

head?Sets the TRUE flag if a series is at its head.
index?Returns the index number of the current position in the series.
length?Sets the operand to the length of the series from the current position.
past?Sets the TRUE flag if the series is past its end.
tail?Sets the TRUE flag if a series is at its tail.

Other Opcodes

applyApply a function to arg block. Set result.
commentIncludes a comment in the code
doEscape to normal evaluation. Result modified.
getGet the value of a word (indirect). Result modified.
gettGet the TRUE flag and store in a variable. Operand modified.
setSet a variable to any value. Operand modified.
setdSet decimal variable only. Operand modified.
setiSet integer variable only. Operand modified.
settSet the TRUE flag from contents of a variable
to-decConvert integer to decimal. Operand modified.
to-intConvert decimal to integer. Operand modified.
type?Sets the operand to the value's datatype.
value?Set TRUE flag if the variable has a value.

Debugging Functions

??Works like ?? in REBOL.
escape?Check if user pressed escape key. If so, halt to console.
printWorks like print in REBOL.
probeWorks like probe in REBOL.

Compare Opcodes

Integer Comparisons

eqSets the TRUE flag if the values are equal.
gltSets the TRUE flag if: value1 < operand < value2.
glteSets the TRUE flag if: value1 <= operand <= value2.
gtSets the TRUE flag if the first value is greater than the second value.
gteqSets the TRUE flag if the first value is greater than or equal to the second value.
ltSets the TRUE flag if the first value is less than the second value.
lteqSets the TRUE flag if the first value is less than or equal to the second value.
neqSets the TRUE flag if the values are not equal.

Decimal Comparisons

eqdSets the TRUE flag if the values are equal.
gltdSets the TRUE flag if: value1 < operand < value2.
gltedSets the TRUE flag if: value1 <= operand <= value2.
gtdSets the TRUE flag if the first value is greater than the second value.
gteqdSets the TRUE flag if the first value is greater than or equal to the second value.
ltdSets the TRUE flag if the first value is less than the second value.
lteqdSets the TRUE flag if the first value is less than or equal to the second value.
neqdSets the TRUE flag if the values are not equal.

Control Opcodes

Conditional

iffIf the TRUE flag is not set, evaluate the block.
iftIf the TRUE flag is set, evaluate the block.
eitherIf the TRUE flag is set, evaluate the first block; otherwise evaluate the second block.

Looping

breakExit from the currently executing block
loopEvaluate the block a specified number of times.
repeatEvaluates a block a number of times or over a series.
repeatzZero-based repeat (0 to n-1)
untilEvaluate a block until the TRUE flag is set
whileWhile the condition block sets the TRUE flag, evaluate the body block.

Branching

braUnconditional branch
brafBranch to target if the TRUE flag is not set.
bratBranch to target if the TRUE flag is set.
brawBranch via variable
labelDefine target label

Function Return

exitExits a rebcode function, returning no value.
returnReturn value from rebcode function.
About | Contact | PrivacyREBOL Technologies 2024