REBOL Technologies

Was the Crash from ReiserFS or a Harddisk Problem?

Carl Sassenrath, CTO
REBOL Technologies
28-Dec-2004 20:30 GMT

Article #0074
Main page || Index || Prior Article [0073] || Next Article [0075] || Post Comments || Send feedback

In analyzing the crash that took down REBOL.net last week, it is very difficult to determine the exact cause. It could be either the harddisk or the filesystem. What we are seeing is a complete mixup of file contents at a low level, but file directory structures are fine - no errors there.

The mixup is interesting in that unrelated files are being merged. For example, in AltME worlds, the users.set database has parts of the registry.set database within it, and viceversa. Initially, I would have guessed this pattern to be a REBOL problem, but I also find recent emails and web pages mixed in as well. That is not under REBOL's control so the problem is not REBOL.

The primary crash patterns of interest are:

  1. The errors only happened on a single partition, /home.
  2. The errors are "merged files" rather than bad data.
  3. Directory structures are not affected at all.
  4. The errors are mainly for "high-frequency" files, some of which are updated or rewritten multiple times per second.
  5. Older, low frequency files are not affected.

This is the second time we've experienced exactly the same type of file system crash. If I had to point a finger, I would say this looks a lot like the ReiserFS 3.6 is at fault, but how can you really tell? A harddisk error that occurs within one of the primary sector maps or journals could perhaps produce the same errors.

But, to be on the safe side, we will roll back to the ext3 filesystem the first chance we get. There's really no good reason for this Linux server to run newer filesystems when the old ones will work just fine... and maybe better.

Post Comments

Updated 7-Mar-2024   -   Copyright Carl Sassenrath   -   WWW.REBOL.COM   -   Edit   -   Blogger Source Code