REBOL Technologies

Copy and Checksum Large Files

Carl Sassenrath, CTO
REBOL Technologies
24-Jun-2006 4:07 GMT

Article #0281
Main page || Index || Prior Article [0280] || Next Article [0282] || 11 Comments || Send feedback

Last week I wrote a short example of how to use checksum ports, and last year I gave an example of how to use the /seek refinement to deal with large files. The code below combines these two concepts in a function that copies a file, even if the file is larger than memory (e.g. MPG, MP3, WAV). It will also compute and return the checksum of the file's data.

This is a robust "commercial quality" file copy function that you can use in your applications. If you find a bug, please let me know and I will correct it here.

REBOL [
    Title: "Copy File with Optional Checksum"
    Author: "Carl Sassenrath"
    License: 'MIT
]

copy-file: func [
    "Copy a file. Return WORD for failure or return optional checksum."
    from [file!]
    dest [file!]
    /sum "checksum the data"
    /local
    data
    path
    ff ; from file port
    tf ; to file port
][
    path: split-path dest

    foreach [block err-word] [
        [make-dir/deep path/1] dir-failed
        [ff: open/binary/read/seek from] read-failed
        [tf: open/binary/write dest] write-failed
        [if sum [sum: open [scheme: 'checksum]]] sum-failed
        [
            while [not tail? ff] [
                print index? ff
                data: copy/part ff 100000
                insert tail tf data
                if sum [insert sum data]
                ff: skip ff length? data
            ]
            ;print index? ff
        ] copy-failed
    ][
        if error? try block [
            if port? sum [close sum]
            if tf [close tf]
            if ff [close ff]
            return err-word
        ]
    ]

    data: none
    if sum [
        update sum
        data: copy sum
        close sum
    ]
    close tf
    close ff
    data ; checksum value or none
]

print copy-file/sum %movie.mpg %movie2.mpg
ask "done"

Notes:

  1. The code has only been tested on REBOL 2.6.2. The code requires a newer REBOL that supports the /seek refinement (Core 2.6).

  2. If you are new to REBOL, note the way the foreach is used to perform error checking for each step and return the appropriate error word for failures.

  3. The make-dir line is correct as written. If you do a source on make-dir you will see that it becomes a no-op if the dir exists. Adding an additional exists? check is not needed.

  4. The "from file" (ff) is opened with /read access. This is done to cause an error if the file cannot be opened. Without it, the file will open as an empty file, even if it does not exist.

  5. The checksum port defaults to the SHA1 (secure hash) algorithm.

  6. The code remembers to close the ports if an error occurs.

  7. File data are copied in chunks of 100000. This number is arbitrary, and you can set it to whatever buffer size you prefer. Smaller numbers may slow the transfer. Larger numbers will require more memory.

  8. Uncomment the print lines if you want to see it working. You could also modify those lines to show a progress bar.

11 Comments

Updated 7-Mar-2024   -   Copyright Carl Sassenrath   -   WWW.REBOL.COM   -   Edit   -   Blogger Source Code