USB types chart showing how USB versions are named

Demystifying ECC Memory

Posted on March 15, 2022

ECC memory or Error-Correcting Code memory has been a buzzword in the enterprise space for years. Experts recommend this memory type for most industrial and commercial applications, and many business owners just go with the flow.

But what exactly is different about ECC memory? When is it actually needed? And are there any disadvantages to going down this route? Here’s an overview.

A Crash Course Into Random Access Memory

Unlike Hard Disk Drives (HDDs) that store their data through magnetized bits, Random Access Memory (RAM) works differently. Since it only needs to hold its data temporarily, it uses “flash” cells, which are basically a combination of capacitors and transistors.

What this means is that RAM holds its data only as long as power is supplied to it, and is wiped clean upon system shutdown. This also means that the flash cells are rather susceptible to electromagnetic interference, at least at the bit level.

Even the most well-shielded memory sticks are noted to develop minute errors during functioning, with single bits getting flipped in the opposite direction. As computers only store information in ones and zeroes (binary), this single flipped bit can change the meaning of the whole information.

Cosmic Errors

For a long time, it was believed that these flipped bits stemmed from residual radiation released from contaminants in the packaging material of the chip, but this was shown to be false. In reality, these bit-flips are caused by cosmic radiation, or more specifically, neutrons.

As such, this problem is virtually impossible to shield against, and worsens with altitude, getting worst in outer space. This means that critical electronics onboard aircraft and satellites need a different way of dealing with these frequent memory issues.

Do Flipped Bits Even Matter?

For the average home user, no. A flipped bit might cause some odd behavior or small glitches, but nothing that cannot be fixed. As RAM errors have no bearing on long-term storage, data integrity is not affected.

The problem arises when we are dealing with sensitive applications that call for absolute accuracy. In computers handling financial transactions, for example, a flipped bit is going to have much larger real-world implications than just a visual glitch.

The same goes for critical electronics powering machinery or vehicles (including planes), where the slightest deviation cannot be tolerated. As embedded computers are mostly deployed in such scenarios, it becomes crucial for these systems to be able to deal with the issue.

The Solution: ECC Memory

Since we cannot exactly prevent the errors from happening, we can only detect and correct them. This is what ECC memory is all about.

A normal RAM unit comes with eight memory chips. ECC memory units come with nine. The extra chip is there to monitor and verify the other chips, ensuring perfect data integrity.

As a result, ECC RAM crashes very infrequently, as flipped bits are detected and corrected in real-time before they can spiral out and cause major faults. This makes ECC memory virtually immune to single-bit errors, greatly improving their reliability.

Drawbacks

As you might expect, all these extra checks and balances can slow down the processing quite a bit. ECC memory already needs specialized processors to be able to handle the complex mathematical calculations required each cycle, and even then the performance is slower than traditional RAM.

On top of that, ECC memory is expensive. For server racks and large-scale installations requiring dozens of embedded systems, the costs can add up.

When Should You Use ECC Memory?

ECC memory isn’t a clear-cut upgrade from regular RAM. It is costlier and works more slowly due to the extensive error-checking process. This makes it unsuitable for most use-cases, where normal RAM would do a much better job.

So when do you need to use ECC? Generally speaking, error checking is required when the application cannot safely ignore minor errors. This means computing systems working in niches where safety – whether physical or financial – is on the line.

Thus, computers used in vehicles, scientific or medical applications, and banking are the biggest users of this technology. Some enterprises might benefit from the reduced failure rate as well (PCs operating industrial machinery, for example), and might want to use ECC memory.

Share: