Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Please Use ZFS with ECC Memory (2014) (louwrentius.com)
62 points by rdpintqogeogsaa on Aug 6, 2022 | hide | past | favorite | 106 comments


Actually everything should be ECCed but it is a consumer/professional segmentation maintained for years


I didn't really appreciate this until I got an ECC workstation. It's noticeably more stable than my previous machines, even under abusive levels of load.

Noticeably more stable, as in, "never ever crashes" as opposed to "almost never crashes" without ECC. Thanks Linux! :D


Did you actually check the ECC error counters?

Memory errors in data centers tend to be concentrated in a small number of bad sticks of RAM rather than evenly distributed across all memory. If you have a machine crashing regularly due to memory errors, it’s likely a bad stick of RAM, not random errors due to lack of ECC.


Agreed, but ECC will prevent most of the errors from crashing the machine and tell you exactly what dimm is causing the problem. So you can replace the dimm (out of pocket or under warranty) instead of playing the replace a random part and hope it improves game.


The ECC error counters are not really honest anymore as repaired faults are often not reported from what i heard.


This depends on the BIOS and the OS. Correctable (and corrected) errors are typically logged into the baseboard controller (the BMC) and the OS "should" periodically dump the logs from the controller to maintain a record of those errors longer term. That said, eventually they just "fall off" the list of errors the BMC is holding because it stores them round robin.

Uncorrectable errors will cause a machine check, unless the BIOS disables it. Which some do.


I tell you that the reporting of correctable errors is not always honest. On some systems you can see uncorrectable errors before you see the correctable errors which statistically can't be. That it should be honest beside the point.


There is a question of agency here.

You are absolutely correct that reporting of errors may not be "honest", I prefer to use the work "correct" rather than honest because honesty implies and intention to lie and the computer logic is generally designed to be correct.

The market reality is that reporting correctable errors can generate unnecessary service tickets as people who are not technically sophisticated. Those users may generate a support ticket wondering why their system is telling them it saw and error but then it corrected it. Dealing with support tickets costs money, money that is taken away from margin, and so it is sometimes "optimized out" which is that some manager makes their numbers better by ordering the software team to "hide" correctable errors which were corrected.

It is absolutely fair to describe that management choice as "dishonest."

It can be difficult sometimes to find out where the dishonesty was applied however. It has been my experience that between the BIOS and the kernel, if the chipset supports ECC and the BIOS recognizes that you have ECC memory installed, there is a means for extracting all ECC events reliably. It isn't always well documented and sometimes requires several levels of escalation in support to get the information you need, but when ECC is important to you it can be worth it. It can also inform your choices for vendors to use in the future. :-)


On hard drives and SSDs, yes, but has this been happening to RAM too?


I guess you could say it has begun with DDR5 on-die ECC. I don't think you can query those counters.


Also probably some perception related bias involved here


Perhaps a high quality machine contains devices with high quality drivers.


I have several non-ecc machines running 24/7 for years without any crashes.


Sure, not particularly surprising. However do you track when systemd restarts a daemon that died? See any strange dmesg errors? Had fsck report filesystem corruptions?

Many errors won't cause a crash, but said errors can accumulate in processes, filesystems, and files and you won't know why. 10 years from now you may find a photo, music file, or binary that's somehow corrupt and have no idea how it happened.


The only machine I've had consistent crashes with due to memory issues IS my ECC memory workstation. Everything else has NEVER crashed due to a memory issue.


But... how do you know that? Without ECC, the only way would be that none of your other machines have ever crashed at all, right?


Yes, that's basically it. Except it's more like, "ugh my ECC laptop blue screened again!" vs. me not able to even remember the last crash like that on my non-ECC machines.

My ECC workstation had lots of memory issues where it blue-screened on the windows side. For some reason that wouldn't happen on the Linux side.


I've actually had the opposite experience. My ECC workstation laptop is the only device I've ever had consistent issues with crashes due to memory issues.

Interestingly, this never happened when I was running Linux and abusing the RAM with photogrammetry workloads. It would only happen with windows when I wasn't using anything beyond routine levels of RAM.


DDR5 includes built-in in ECC. However recoverable errors are not reported and it provides slightly less redundancy then ECC due to some of the redundancy budget compared DDR4-ECC is being eaten up by less reliable cells but in total i expect improved reliability over DDR4 non ECC.


To my understanding for DDR5 on-die ECC is due to the high frequencies the individual dies run at. You are still vulnerable for anything between the RAM to the CPU.


That's correct. Still the likelihood if errors of data at rest is much reduced. Memory controllers might be able to detect a lot of errors between the RAM and CPU due those errors being in the analog domain, while errors in memory are in the digital domain as the signal gets digitised and send over an analog channel.


Yes, but you should still use ZFS if you care about your data, even if you don't have ECC.


I hate this headline and wince every time I see it, even the article quotes

> There's nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem. If you use UFS, EXT, NTFS, btrfs, etc without ECC RAM, you are just as much at risk as if you used ZFS without ECC RAM. I would simply say: if you love your data, use ECC RAM. Additionally, use a filesystem that checksums your data, such as ZFS.

And goes on to say

> I have nothing to substantiate this, but my thinking is that since ZFS is a way more advanced and complex file system, it may be more susceptible to the adverse effects of bad memory, compared to legacy file systems.


I’m actually fascinated by how much this post became accepted canon over the years.

The truth is that everyone should want to use ECC wherever data integrity is a priority, regardless of whether or not they’re using ZFS. The conjecture about ZFS being uniquely vulnerable has been debunked over and over again. Maybe there was something different back in 2014 when this was written, but even if that was the case it’s not a good idea to use such old blog posts full of admitted conjecture to anchor your technology understandings.


There was a concerted effort to push the claim everywhere close to FreeNAS, to the point that another "community canon" was that FreeNAS is behind it, as they got money from selling you "certified NAS boxes with ECC memory".

The only things that really happened was that a) ZFS exposes how failure prone hardware is due to pervasive checksumming b) ZFS came from culture (and project goals) that prioritized data safety - as such, it was possibly first contact for many with strong recommendations about ECC memory


I remember using FreeNAS many years ago and would regularity check out the forums.

The amount of people rehashing this "you must use ECC with ZFS" was insane.

One guy seemed to have made it his life's mission to publish this same story over and over again.


People love clever contrarian takes. A thing that sounds good is actually bad!


> I’m actually fascinated by how much this post became accepted canon over the years.

I think the misunderstanding goes something like this. I'm pretty sure all three of the below statements are true:

• People who prioritize data integrity are more likely to be using ZFS.

• People who prioritize data integrity should use ECC memory.

• Ergo, if you are using ZFS, you should probably be using ECC memory.

However, it doesn't naturally follow that because you are using ZFS, you should be using ECC memory, nor that people without ECC memory should not use ZFS. This is unintuitive.


I have an Oracle Linux PC, running everything in btrfs.

I use a usb-attached zfs raid-1 for my important storage. I have to run "zfs-fuse" before I mount it.

What I know about ecc comes from this:

https://arstechnica.com/civis/viewtopic.php?f=2&t=1235679&p=...

I have never scrubbed my zfs vault, but I have scrubbed my btrfs drives.

I should get something with ecc. Wow, do I have a lot of stuff.


Why on earth are you running oracle linux or zfs-fuse?

I have no idea why one would use a 2006 Google summer of code project instead of the official openzfs project which provides an actual kernel module that would be several times faster and likely safer.


Actually, while we're here, I would love to know more about the ZFS_DEBUG_MODIFY flag at that link. I've come across that post before and have always been wondering if I should be using the flag, since I can't practically use ECC. But the flag has debug in the name and I've never been able to find more information about it.


I would like to add that running a zfs raid array without ever scrubbing is asking for data loss ask Linus from Linus Tech Tips how he lost data repeatedly.


Another myth that still lives on is that you need 1gb of ram for every 1tb of disk on zfs.


ZFS dedup made that a thing but it's almost never used by consumers.


> Maybe there was something different back in 2014 when this was written

The problem was the same even in 2014: ZFS lets you know how shitty your hardware is.

And, just like 2014, people would rather shoot the messenger because they don't have any power to force companies to make better hardware.


Eh, the chance that data I care about will get corrupted in a way that matters is very small. ECC also does not eliminate errors, just reduces them by some constant proportion. Something quite rare becoming say, 10 times less likely doesn't really effect my decisions much. If you're using ECC you can make the exact same argument that you need another bit of parity with the same justifications of making something rare, rarer. Of the things than can go wrong with my data, small corruptions from memory bit flips are far down on the risks to worry about.

If you're Google (you're not Google) then the scale of errors becomes a choice of optimization.


ECC does not reduce errors only 10 times, but by several orders of magnitude.

The frequency of DRAM errors is proportional with the amount of memory installed in a computer. For smaller quantities of memory, e.g. 16 GB or 32 GB, the frequency might be of one error every few months. With ECC, the mean time between errors is likely to become longer than the lifetime of the computer.

Intel, who has initiated this absurdity of convincing the naive customers that it is OK to buy computers that may make mistakes from time to time, has succeeded to get away with it only due to the huge amount of software bugs, which have made the computer users believe that whenever a computer crashes, or some corrupted data is discovered, it is more likely that the cause has been a software bug and not a hardware defect.

The integrated circuits made with modern CMOS processes, with very small devices, have a non-negligible aging rate.

Because of that, memory modules that have been used for many years start at some point to have much more frequent errors than when new.

When you have ECC, such memory modules are immediately detected and they can be replaced before causing some irremediable data corruption. This helped me a lot a few times.


I've seen many ECC log lines on machines before, my wifi routers have ECC memory and they have seen at least one fault each. It isn't a one in a million sort of risk, it's just that most systems don't run with ECC, and maybe that bit flip just causes a crash or instability that's chalked up to software bugs. Especially as memory densities continue to skyrocket I would expect to see errors becoming extremely common place.

It's some sort of variation of the Toupee Fallacy, you aren't aware of the frequency of bit errors because you have absolutely no way of knowing when they happen, and what "random" errors were caused by them.


It’s not that errors don’t happen it’s that they are rare and the surface area of them affecting something important in an important way are incredibly rare.

I don’t care about triggering an occasional bug or a bit flip in a video stream, these things are inconsequential. If a bit gets flipped in a photo it’ll be a very slightly different photo. A piece of text will be slightly different… if either had the luck to be in memory ready to write when hit, in dozens of gigabytes of memory.

The most likely consequence of a bit flip is nothing at all happening because it hit unallocated memory. The next most likely consequence is nothing as it hit a piece of program memory that has no effect on operation and gets overwritten in moments.

ECC zealots are really excited about preventing something that just doesn’t happen all that often: actual data corruption with permanent significant effects.


My point was that you don't know when corruption has had an effect, because you're eschewing a method of detecting it.


> my wifi routers have ECC memory

This is really weird. How come even routers have ECC memory while our actual computers don't?

Why can't all memory just be ECC memory? I don't even see the point of non-ECC memory. Everything has error correction built-in: hard drives, SSDs, even optical media. Yet RAM doesn't?


Using memory without ECC on the so-called "consumer" devices, in opposition with the "professional" devices, a.k.a. servers and workstations, has been caused by Intel, from 1994, when they have introduced the second generation of Pentium CPUs, to 1995, when they have introduced Pentium Pro.

(The 1st generation of Pentium, the 60/66 MHz models, had been introduced in 1993, while the 2nd generation, the 90/100 MHz models, introduced in 1994, used a different socket and different motherboards.)

Unlike with the previous Intel CPUs, Intel had the idea to introduce a market segmentation between Pentium and Pentium Pro (for the successors of Pentium Pro, Intel created the Xeon brand, a couple of years later).

Therefore, Intel has retained the ability to detect and or correct memory errors only in the Pentium Pro chipsets (at that time the memory controllers were in the so called northbridge, a part of the chipset, and not in the CPU), while removing this feature from the Triton chipset intended for Pentium.

While Intel has initiated this market segmentation policy, between "amateurs" and "professionals" (unlike IBM, which since the first IBM PC had always included memory error detection via parity), both the memory module vendors and the makers of other CPUs have been also happy to follow this policy, because it allows them to extract much more money from the knowledgeable users than the cost of adding ECC, while also having larger profit margins when selling to naive users, because the elimination of ECC/parity has never caused any price reduction.

When this feature was removed, all the new motherboards and modules without parity/ECC had the same prices as the previous models with error detection (for many years, the memory errors were detected via parity, but then the memory controllers were improved, so that when using the same number of extra bits, i.e. the same memory modules and motherboards as for parity, single errors could be corrected, not only detected).


I specifically bought hardware (APU2) for my wifi routers that has ECC, it's just not something that's commonly on hardware you could buy off the shelf. Like the other comment said, it's a stupid product differentiation thing that Intel did. This hardware for the point, uses a very old AMD Geode processor.


I'm also looking at building a router with ecc ram. Do you have any hardware recommendations?


The APU2 is cheap, has lots of hardware support, but is getting a little long in the tooth. If you can deal with maxing out at gigabit speeds, it’s an amazing choice and can run just about any software because it’s just a standard x86 board.


As someone who doesn't use ECC memory for my NAS (I'm not google) I have experienced many bitflips / corruptions for some sets of photos, especially when using a NAS with what turned out to be bad ram sticks (which was recognized after switching to using ZFS and raid 10, I don't know if this saved any of my data but the setup at least made me recognize that I had a problem).


This has been discussed on HN some times before. User xornot looked at the zfs source code and debunked it, for more details see

https://news.ycombinator.com/item?id=14207520


I have nothing to substantiate this is another way of saying "I'm making this up" but somehow think my intuition is worth more than the actual knowledge of the experts who actually did the work.

It's as suspect as I have nothing to substantiate it but my kid caught autism after that shot.

It's so badly reasoned it calls into question the authors judgment on any technical topic because it says his basic tools for reasoning are lacking.

99.9% of data loss occurs because of well understood but difficult to fully mitigate problems.

There is no reason to believe zfs is more susceptible to bad memory and it's not exactly new having begun development 21 years ago.


Actually, I would have expected that it was more important to use ECC with a filesystem that lacks data checksums, since those should at least catch corruption.


ECC is not about disk corruption, while the checksums are about disk corruption

ECC is corruption of the parts in memory, like caches of data, data when it's in the process of being compressed or encrypted, some management data always cached in memory etc.

In the end not having ECC can affect your system in such abstruse ways that IMHO it probably doesn't make any relevant difference if you use zfs or e.g. ext4.

Tbh it's kind absurd that ECC is not the standard for any non-"cheap" PC/Laptop.

EDIT: Checksums can still potentially help in the unlikely case that the RAM error is a bitflip in some loaded offset/ptr referring "into" the HDD/SSD storage (e.g. position of some file on disk).


The good thing about filesystems which have no error checking is that they won't generally go back and corrupt the old, idle, already written data on disk when they have a memory issue. If you run a scrub/whatever to check validity of old data and your system has a new memory issue, it could destroy everything very quickly even without new file writes. With a less feature full filesystem your data corruption could still be quite bad, but it would have a better chance for recovery as even if the filesystem is severely damaged, the file data would be (mostly) untouched. ECC helps prevent both of these cases, it's really too bad it's so hard/expensive to use outside of server hardware.


> If you run a scrub/whatever to check validity of old data and your system has a new memory issue, it could destroy everything very quickly even without new file writes.

No, it won't.

https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-y...

> Let’s assume that we have RAM that not only isn’t working 100% properly, but is actively goddamn evil and trying its naive but enthusiastic best to specifically kill your data during a scrub. First, you read a block. This block is good. It is perfectly good data written to a perfectly good disk with a perfectly matching checksum. But that block is read into evil RAM, and the evil RAM flips some bits. Perhaps those bits are in the data itself, or perhaps those bits are in the checksum. Either way, your perfectly good block now does not appear to match its checksum, and since we’re scrubbing, ZFS will attempt to actually repair the “bad” block on disk. Uh-oh! What now?

> Next, you read [a copy of the same block from another disk]. Now, if your evil RAM leaves this block alone, ZFS will see that the second copy matches its checksum, and so it will overwrite the first block with the same data it had originally – no data was lost here, just a few wasted disk cycles. OK. But what if your evil RAM flips a bit in the second copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything. It logs an unrecoverable data error for that block, and leaves both copies untouched on disk.


I did not mention ZFS specifically. If ZFS has better handling of this kind of thing, that's great, but if you can't trust your memory to be correct you can't trust the data in buffers, the data being hashed, or the data being read from or written out to disk. Additionally, you can't trust the filesystem to behave in the ways that it should. There are many kinds of memory errors, some may for example impact certain data sequences in a fairly deterministic way. Some are completely random, some can be triggered by users or attackers.


Unless the filesystem is behaving in a way that is overwhelmingly stupid, the basic logic should still apply. I don't understand how error checking could ever cause data corruption. It might let you know about data corruption which would otherwise have gone unnoticed, but that's not the same thing.

If there is a filesystem that is dumb enough to cause corruption during the checksumming process, please let me know which one, so I can be sure to never ever ever go anywhere near it. :)


A lot of things in computing are overwhelmingly stupid or assume everything will work as expected. I have experienced several data corruption events related to parity data being read incorrectly, not in ZFS, but with hardware and software raid controllers. In one case the hardware raid controller even had ECC memory, but its memory was overheating and thus introducing bad data into calculations when multi bit errors were not correctable. A similarly horrific error condition saw a controller confuse disk IDs in memory and start mirroring one drive to every other drive in the system.


Those are not instances of error checking causing data corruption. As I said, "I don't understand how error checking could ever cause data corruption."

Error checking will only ever help you, not hurt you. It doesn’t matter how bad you memory or disk or raid controller is. Error checking won't necessarily save you from those things, but it can in some cases, and it’ll never make things worse.


But they are though, the parity data calcs being corrupted in that first example caused data corruption during a scheduled array check while the system was under unusually heavy load. Error checking is good, and when things are working right it can only help. That is true, but it can't always be counted on if the hardware, software, etc is untrustworthy for whatever reason.


Okay, well I am totally and utterly confused as to how that could ever be possible, regardless of the hardware. You're confident that if not for the data validation the problem wouldn't have occurred?


I'm afraid you don't know what you're talking about. The Matt Ahrens quote in the article actually specifically debunks this claim:

"There's nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem. If you use UFS, EXT, NTFS, btrfs, etc without ECC RAM, you are just as much at risk as if you used ZFS without ECC RAM. I would simply say: if you love your data, use ECC RAM. Additionally, use a filesystem that checksums your data, such as ZFS."


I did not specifically mention ZFS anywhere in my comment, bad memory corrupting data being actively changed remains a problem for any filesystem. If the filesystem is actively changing data it can not rely on anything it is writing to disk being correct if the in memory buffers and other data structures are themselves corrupted.


> I did not specifically mention ZFS anywhere in my comment

Wow. If you weren't talking about ZFS in a comment under an article concerning ZFS, an article concerning the same issue you are parroting, an issue that supposedly effects ZFS, what filesystem were you talking about?

Let me guess, a hypothetical filesystem.

Sure, whatever.

> bad memory corrupting data being actively changed remains a problem for any filesystem. If the filesystem is actively changing data it can not rely on anything it is writing to disk being correct if the in memory buffers and other data structures are themselves corrupted.

And, yet, that's not the argument you just made. This is what makes me thinks this is a bad faith pivot to something, anything respectable. It's not like I'm going to forget that you said right above (it's still there!) that a scrub could destroy all your data. This claim has been repeatedly debunked. You may have been taken in. Fine. That happens. Just, now that you know, don't keep spreading this FUD.


I was thought I was replying to a general comment made about behaviors of filesystems in general and was attempting to clarify that. We obviously disagree about how safe and reliable certain things are, but that is fine.


It's not up for debate or interpretation that zfs relies on ram more than most other filesystems. They all use ram, because everything uses ram, and so that much is a baseline that cancels out between anything and anything else.

And zfs uses more and relies on it more than others, above that baseline.


Everything that uses linux pagecache as interface (i.e. everything not ZFS on linux) is way, way more interleaved with memory usage than ZFS, with some extra gotchas due to how pagecache works by mapping disk and memory 1:1 (great simplification, but unless you go out of your way the FS implementation of write()/read() is never used - instead the the same operations as with mmap() are called and write/read calls copy from memory.)


I have to disagree with this guy.

Sure, the corruption can occur, but if the memory is not faulty, ZFS will detect data corruption in-situ during a scrub. It's just that ECC errors can introduce their own corruption that is separate, during write operations only.

I believe if you have no choice but to use non-ECC (e.g. existing hardware that is limited by Intel's stupid design choice of "ECC is only for servers"), ZFS is still much better than using ext4. It still protects against a different class of errors which is HDD degradation. When used in RAID-Z it can even recover for them.

For perfect protection ECC is necessary. But this is not always possible financially. I think it's a bit of a wild statement to say that if you can't afford a server with ECC you should forget about ZFS entirely.


The article tells you that if you don’t need ECC memory, bitrot as a risk isn’t that important to you. It is clear that you don’t want to pay for that kind of safety so. So if bitrot isn’t that important, you can go with any other file system, or ZFS but don’t pretend you’re covered.

I would rather run XFS or EXT4 with ECC memory than ZFS without ECC because silent bitrot is extremely rare because drives and protocols are full of error correction stuff.

Memory is the real weak spot from a bitrot perspective.

ECC first. ZFS second.


Sounds like the article should have been titled, "Why you shouldn't use ZFS".

   ZFS does not have such [recovery] tools, if the pool is corrupt, all data must 
   be considered lost, there is no option for recovery.
In other words, this marvelous piece of over engineered technology is fragile. When it works, it's great. When it fails, it's a complete disaster.

Everything fails eventually. How do you prefer your failure served? In small manageable increments or one spectacular, complete, overwhelming, unrecoverable helping.


I think it's important to note this wasn't written by an expert on ZFS. And I'm pretty certain this author is actually wrong about this particular point, as ZFS does allow a number of extraordinary measures to recover a pool from a corrupted state, such as:

  zpool import -FX mypool, where:
  1) -F Attempt rewind if necessary.
  2) -X Turn on extreme rewind.
  3) -T Specify a starting txg to use for import.
There are a few other points you might consider:

First, what kind of failure might cause an unrecoverable corrupted pool, and how useful are other filesystems data recovery tools actually, if corrupted in a similar fashion? Is there an apples to apples comparison that you can share? I believe the author and you are sharing speculation.

Second, how do you know you have corruption with another filesystem until you read back the data? And do you even know you have corruption then?


Sure, those options are nice, but they use the same code paths as regular reading/mounting does. If you've got some corruption that causes a crash, *all zfs tools are useless*. You must whip out some idiotic 13yo Python script to destroy uberblocks and you might make it work.

Usually bug => corruption => module crashes => no zfs own tools work.


First, the author is wrong by your own terms. If you think a 13yo Python script is the state of the art for ZFS recovery, that still doesn't comport with what the author said ("all data must be considered lost, there is no option for recovery"). The author is facially wrong. If you had said "I once had a corrupt pool I couldn't recover", that's a useful data point. But instead you tied your wagon to someone who obviously doesn't know much about what they are talking about.

Second, you managed to ignore my two other original points -- 1) Is the state of other filesystems any better re: recovery in similar circumstances (show your work, when XFS is afflicted with the exact same type of corruption, how is it better)?, and 2) is it possible are you better positioned to deal with corruption with ZFS (because it's usually not silent)?

I'm not saying ZFS is better for you. You are obviously not predisposed. What I am saying is -- your argument has to be better supported for it to make any sense.


It's very difficult for me or anyone to gather statistics about ZFS crash recovery. I do not think that script is the state-of-art of ZFS recovery, but it's the most common suggestion. The GitHub issue about a replacement is still open, there seems to be no functional alternative.

Maybe such an absolute statement the author made is somewhat incorrect, but it's not entirely incorrect. Empirically so, according to the issue tracker. If people didn't request these things every year, I'd agree that there are no recovery issues.

I do think that other filesystems tend to have better 1st and 2nd party tools for repair and recovery. It is also true that some file systems are less complex and that might help. But isn't that an argument for providing even stronger 1st party tooling and documentation? We want people to not lose any data, no?

So to clarify the discussion a bit, what is it that you're claiming? Are you claiming ZFS has no recovery issues or that it's okay because some others are equally terrible to recover?


sigh It still checksums everything it touches and can read damaged file systems in most cases. More importantly it can tell you which parts have been corrupted and which haven't. Because all live data on disk is checksummed and the checksum is part of the reference (except for the ring buffers of superblocks which carry their own checksums) you can find detect data corruption and recover to a sane state. If you detect and guestimate a fix for corruption on traditional file system by your own logic you would have to consider all file system data structures as well as all file content suspect with no recover path short of wiping the file system and restoring the data to a fresh file system. Oh wait your backup was made from an untrustworthy source...

It's not as black and white as you or the blog post makes it out to be. I've had to recover damaged ZFS pools while traveling (there are neither affordable nor travel compatible no laptops with ECC RAM support). ZFS scrub told me which 4 files have been corrupted and overwriting them with good copies from an other machine solved the problem. Without ZFS (or a similar file system) I wouldn't have noticed the data corruption as quickly and wouldn't have known what to restore. Also ZFS is no one trick pony and has more to offer than "just" end to end checksumming e.g. pooled storage for multiple file systems and sparse block devices, fast consistent snapshots (good enough to backup a running RDBMs), incremental backups and replication (no more dreaded full backups), ease of administration, easy to grow capacity (as long as you do it in large enough increments), transparent compression and if you really need it block level deduplication (you probably don't want online block level dedup).


It's just not true that if ZFS metadata is corrupted, then the entire pool is gone. ZFS is a transactional file system, so the pool will be put in a faulted state, and it gives you the option to roll back to before the faulty transaction.

I really do not see how anyone could possibly think that is worse than your data being silently corrupted on disk, which then propagates to your backups, corrupting them too.

People seem to think ZFS is worse because it tells you when your data is corrupted, rather than not telling you? It doesn't make any sense.


> It's just not true that if ZFS metadata is corrupted, then the entire pool is gone. ZFS is a transactional file system, so the pool will be put in a faulted state, and it gives you the option to roll back to before the faulty transaction.

It is true because ZFS on Linux crashes very quickly when metadata is corrupt. It can't roll back transactions because it'll crash almost instantly even looking at your pool, not to mention when scrubbing it.


If you checksum garbled data from memory, that checksum is useless. Garbage in, garbage out.


Yes, the checksum is useless, because the checksummed file is corrupt. By contrast, without checksumming... the file would still be corrupt!

In the best case, when good data was written to disk, checksumming preserves the good data. In the worst case, when bad data was written to disk, checksumming does no harm.

Checksumming is good.


No one is saying that you shouldn't use ECC, obviously you should if at all possible. But even if you don't have it, that is no reason not to use ZFS.


If you're at the point of using NTFS data recovery tools, you've already lost. With an enormous amount of time, effort, and/or money, you may be able to recover something, although without checksums you'll never be able to validate whatever-it-is you pull out. I wouldn't call it a "manageable" failure.

The goal is to never reach this point, via a combination of redundancy and backups. ZFS helps enormously with the first part.


Author here: I think ZFS is totally fine, but it's about understanding risk. And running ZFS on your laptop without ECC, that's ok.

Running ZFS on a home server as a NAS but without ECC memory, you are addressing one risk, but you are forgetting another (IMHO bigger) risk: faulty memory causing bitrot.

It's as if you lock your back-door and leave the front-door unlocked.


Understandable, but by telling people to just forget ZFS and use EXT4 instead, you're getting them to leave both doors open.

Also, bitrot is not like burglars. Burglars try one door and if they don't succeed they will look for further vulnerabilities. Memory and disk corruption is random, it's not trying to attack you.

In many cases ECC is just not possible without buying expensive new hardware (with possibly higher power consumption) due to intel being so precious with ECC as a 'server feature only'. Data protection is not absolute, protecting against one class of corruption is better than none. E.g. one of my NASs is a low-power NUC specifically chosen for its power consumption.

Of course both is even better but not everyone has the financial resources for perfection.


I would never tell people to forget ZFS, it’s understanding that ZFS alone just doesn’t reduce all bitrot risk. Groeten!

In the end people worry too much, all consumer NAS gear doesn’t have any bitrot protection… (to put things in perspective).


Actually this is a good thing. If you have a failure you want to fix the faulty device and restore from backup. The biggest problem is having a failure, and then overwriting the good backup with failed data. I have had this happen to me, and have since switched to ZFS for storage I want to keep, with multiple backups.


with multiple backups

Multiple rotating backups would have solved your problem before ZFS.


Unless you can somehow make backups every few seconds, restoring from a backup still means loosing data. It's just highly preferable to loosing all data.

Also, on a non-checksummed filesystem, how will you know whether a file needs to be restored from the backup?


> Also, on a non-checksummed filesystem, how will you know whether a file needs to be restored from the backup?

Has same edit epoch? If yes skip. If no compute checksum of both and compare, copy over in case they differ.

I am fully aware of the flaws of this.


You should have an offsite backup. If you don’t, you are asking for data loss, regardless of what filesystem you use.

I personally am glad that ZFS doesn’t have “easy to partially recover a corrupted FS” on their list of priorities. Designing for that might take resources away from making the system as a whole more robust, and at best it’s a half-measure that will never be a substitute for an offsite backup solution.


> If you don’t, you are asking for data loss, regardless of what filesystem you use.

Backups have a delay. A filesystem that fails catastrophically is significantly worse than the one that doesn't.


Suppose your FS has a 50% chance of partially recoverable failure in a given time range. Is that always better than the catastrophic alternative, even if the odds of catastrophic failure are 0.01%?


That risk assessment depends on your use-case. Those percentages also aren't very correct. Assuming that new files are being constantly created and they're not of vital long-term integrity, the chance of preventing silent bit rot (ZFS is supposed to protect against) is worth fuck all if it's much more likely that a bug takes your pool irrecoverably with hours of data.


Your file system is not a backup technique. All file systems will fail.

The only time I’ve had ZFS fail, was when both drives in a mirror pool died within 8 minutes of another. No filesystem would save you there.

Since ZFS has a replication protocol (ZFS send) recovery was basically a few ZFS revc calls into a new pool.


Assuming that you bought two of the same device?

https://news.ycombinator.com/item?id=32048148


Seriously outside of very specific use cases I think there is a major bandwagoning with zfs. Using snapraid for larger unchanging data or normal raid without striping for vms or dbs allows your recovery story to be more flexible/robust


> I think there is a major bandwagoning with zfs

And I think there is a kooky caucus for half ass solutions. The Linux community has chosen to beclown itself by pretending ZFS isn't absolutely amazing (and free!), and the result is the strangest collection of system level kludges anywhere.

You forgot Stratis and unraid!


Java is free but Oracle still sued. No one has a problem with ZFS, they have a problem with its owners.


So what are some use cases you use zfs for?


> there is a major bandwagoning with zfs

Well, there is a huge number of reasons for that.

Just recently ZFS got RAID-Z expansion capabilities. Before that, I would have had to make plans for a storage server and buy all the storage devices upfront. Now I can expand the storage pool as needed, one device at a time.

The ability to do this is the only reason I even bothered with btrfs. Now that ZFS has flexible storage pool expansion, it is essentially perfect as far as I'm concerned.


I have 10 SATA HDDs (14-18TB each). I would like to build a TrueNAS box with ECC memory on a budget of less than $700 (used hardware is perfectly fine).

Any recommendations for what to look on ebay?


I'd recommend keeping an eye on Craigslist, or unixsurplus.com is also great, but they usually stock newer gear out of that price range.

My NAS is a 2U Dell R510 (Dual E5649, 32gb ECC memory), I think it cost me about $250 on Craigslist, and I've had zero issues. It will likely be difficult to find a machine that can take 10 drives, mine can take 8.

If a machine has 3.5" SAS bays, it likely can take SATA drives as well, but be sure to do your own research. Also, in a lot of these older machines the PCIe card that connects the SAS backplane is hardware RAID only, which means you will have to find a new card, and possibly flash it with a new firmware to enable JBOD mode, so that your disks are passed directly to the host OS. I had to do this for my NAS but it wasn't difficult.

One other tip I highly recommend, you can buy PCIe cards that fit one or two 2.5" SATA SSDs in. I run my TrueNAS off of a mirror of two small SATA SSDs this way, which frees up space for more hard drives in the front.


You could use an older AMD Zen CPU with a consumer motherboard that has ECC support. Gigabyte and Asrock are manufacturers I know that list ECC support on their product pages. The ECC memory is going to eat up most of the budget I would say. Make sure you don't buy Registered ECC memory under this plan.


Please always use ECC because your Filesystem-cache (for example xfs) is the same as your ARC (on ZFS)...it's really the same.

But you can activate Arc check-summing at least.

That's a better article:

https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-y...


LTT channel recently lost 1 PB of videos on ZFS on super expensive server with ECC RAM.


That was mostly misuse, but ZFS does have its fair share of bugs.

My favourite aspect about ZFS is the 13yo Python script for reverting transactions by destroying uberblocks being the to-go even now.


Ddr5 has some form of built in ecc. Finally bringing it by default to the masses

Edit: sorry, this is misleading. See below or ignore. Ddr5 "ecc" that I am referring to is not the same as what one would think when saying ecc normally


This isn't accurate. It has onchip ECC to mitigate the fact that we're pushing the limits of hardware to eek out more performance from DDR5, but there isn't ECC between the ram and cpu, by default anyways.

So this is one more thing that will confuse the consumer.


The masses had parity ram back in the day, then there was fake parity ram, then there was no parity. Internal correction codes have been normal for disk storage for a long time, but it's not really enough unless there's reporting and monitoring.

What does consumer DDR5 do when there's a correctable error, what does it do when there's an uncorrectable error?


It's not clear whether the resulting error rate is any better than DDR4, and I assume you can't monitor on-die ECC either.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: