REBOL Technologies

Add a DIFF Function?

Carl Sassenrath, CTO
REBOL Technologies
15-Feb-2005 20:37 GMT

Article #0126
Main page || Index || Prior Article [0125] || Next Article [0127] || Post Comments || Send feedback

It's time to consider adding a diff function as a built-in part of REBOL. It would really help with source and documentation archiving -- as will be provided in REBOL/Service applications. I would like to get some of your comments and ideas. Here are some of my initial thoughts on the subject...

A diff function would be used to create a list of differences between two series values. It is done in such a way that the resulting difference list can be used to perform a merge with the original data to produce the final data. The benefit of diff is that it allows us to build code and documentation archival and backup systems that keep only incremental differences, rather than saving a complete copy each time.

Note that the diff function is not the same as the existing REBOL difference function. In REBOL the difference function treats the data as individual data elements:

>> difference "abcde" "abe"
== "cd"
>> difference ["abc" "def"] ["def"]
== ["abc"]

The implementation of a diff function can get very complicated depending on what the requirements are. But, for text files a simple diff is well understood. It computes differences based on text lines as the unit of comparison. Differences between a source and a target are indicated as either added lines or deleted lines (and you can also add changed lines, but that's not strictly required because delete + add = changed).

A possible example of a diff result as a block might be:

probe diff str1 str2
[
    d 10
    a 15 "This is the example."
    a 40 "No source is shown."
    d 43
]

It might also be possible to shorten this even further if we were to use positive and negative integers:

[
    -10
    15 "This is the example."
    40 "No source is shown."
    -43
]

We should think about that.

We can also think about if we want diff to work at a lower level, such as on the separate bytes of a string. This can be much more complicated, but it allows diffing of binary files. Perhaps we can leave that as an option for later implementation.

Of course, we should also discuss if we might want to expand on this definition of diff to handle REBOL blocks:

b1: [test 123 "abc"]
b2: [test "abc" http://www.rebol.com]
probe diff b1 b2
[-2 3 http://www.rebol.com]

There are a few issues here, such as how to handle blocks that are nested. Perhaps this feature should also be an option for later.

We should think about this and discuss all of this. I know some of you have already thought a lot about diff, and perhaps some of you have implemented the function (e.g. Volker Nitche has one.)

In addition, we should also think about whether implementing diff would imply that we should also implement merge. That might make a lot of sense.

Regardless of the possible options mentioned above, it would be quite useful to have some degree of diff implemented as a standard part of REBOL. Let me know your comments.

Post Comments

Updated 8-Mar-2024   -   Copyright Carl Sassenrath   -   WWW.REBOL.COM   -   Edit   -   Blogger Source Code