Comments on: More About REBin and Decimals
REBOL Technologies

Comments on: More About REBin and Decimals

Carl Sassenrath, CTO
REBOL Technologies
11-Nov-2004

Article #0044
Main page || Index || Prior Article [0043] || Next Article [0045] || Post Comments || Send feedback

Thanks for some of your comments on Decimal representation for REBin. Let me clear up a few things...

First, from the beginning REBOL was made to use IEEE. I learned a lot from the Amiga experiences of doing a custom floating point function library. That's not the way to go. IEEE is what we use.

As you know the Decimal! datatype has been part of REBOL for many years. What we are looking at here is the issue of the binary representation of FP within the REBin format. Now, I've not said much about REBin, so let me say just a bit here to clarify....

REBin is similar to a byte-coded interpreter, but for data not code (of course in REBOL code is data, so you could argue that point). Its objective is the accurate representation of REBOL lexical format (not evaluative format) in a binary form. You might ask, "why do this?" and that would be an excellent question. There are a few reasons:

  1. The main reason: to interface to REBOL plugin modules, REBOL needs to have a portable in-memory representation that can be passed to and returned from plugin functions. You would not want to parse REBOL source strings within your C code functions. Using REBin will be much simpler.

  2. RIF (index file) records will have the option of storing in REBin format. Here the advantage is that the data of the record has been pre-validated and does not require re-validation to be restored into memory. It can also collapse word references that are found within records, eliminating the symbol table lookup step for each word (but still requiring it once for the localized symbol table). The net result is that saving and loading to RIF records will be faster by some amount (not a huge amount, but every bit helps when you have a lot of records).

  3. REBOL encapsulation will use REBin format. You may not realize it, but every time REBOL boots, it "loads itself" and all of its components from a source code string representation. In other words, if there are 100000 values (words, numbers, strings) that are part of that bootstrap, each one must be separately parsed and validated. If the integer 1234 is part of the code, then every time you run REBOL it checks that 1234 is indeed a valid integer, even though it is static and will not change.

Now, REBOL does this process so quickly that you don't really notice it most of the time. But it does remain a factor, and that's why I often use REBOL/Base for my CGI scripts, because there is so much less to load on the boot, and the resulting CGI page is produced faster.

This also extends to programs encapsulated with the SDK. Every program (e.g. AltME) itself must parse and validate. So, in the future with REBin, that step will not be necessary. The program will be stored in REBin format and the only step is a minor one of translation to the in-memory-format. No parsing or validation is needed.

For encapsulated programs, there is also the extra advantage that developers can selectively remove symbols from the REBin symbol table. This allows commercial developers to do a better job of hiding their code to protect their investment. This is a small thing, but it is an important factor to some developers.

When it comes to the representation of REBOL values in REBin format, most of them are obvious. For instance, strings remain as strings, and integers will be variable length (1, 2, 4, 8 byte formats), etc. The datatype that is not obvious is Decimal. It has a few ways that it can be represented, and each has advantages and disadvantages.

However, it currently looks like we'll be using the IEEE format directly. The advantage will be that it requires no translation from the in-memory format. Therefore it will be fast to load and will remain accurate (because there will be no conversion to and from a text representation). It is also a common standard. The disadvantage will be that it will require eight bytes for every value, even for a value as simple as 1.0. But, we don't think that will be a problem.

Post Comments

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

CS-0044


Comment:


 Note: HTML tags allowed for: b i u li ol ul font p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

Updated 20-Nov-2017   -   Copyright Carl Sassenrath   -   WWW.REBOL.COM   -   Edit   -   Blogger Source Code