Feedback to Forget Me Not! - Murphy's Law, Jun' 2001
return to Murphy's Law
Errata
There is a reference in the article to Ross Williams's "A Painless
Guide to CRC Error Detection", but no reference is given.
This document is an on-line guide and is available, in a number
of slightly different formats, on the Internet at any one of:
ftp://ftp.rocksoft.com/papers/crc_v3.txt
http://www.repairfaq.org/filipg/LINK/F_crc_v3.html
http://www.acte.no/teknisk/html/crcguide.htm
Feedback
A number of readers mentioned various approaches to handling
EEPROMs that are limited in the number of write cycles that the
chip can handle. I neglected this topic in my original column,
since I was tring to stick to the higher level software issues,
and avoid the hardware and technology-specific stuff, but it is
an important issue, and probably should have gotten a mention.
Some of the mails follow:
Dear Niall Murphy,
I can relate to your article, Forget Me Not, in ESP June 2001.
I would like to add that you can not assume only a single byte
will be corrupted when power fails during an EEPROM write. We
have experienced multiple byte EEPROM corruption in the Atmel
ATmega103 microcontroller. For the revision of the ATmega103 we
used, the address registers EEARH and EEARL could change randomly
as power drops out, corrupting random multiple locations. Our
solution for this was to store factory calibration data in triplicate
and implement a voting algorithm to repair corruptions.
As an interesting side note, Atmel recommends "Avoid using [EEPROM]
address 0 for storage, unless you can guarantee that you will
not get a reset during EEPROM write." This is because an unexpected
reset during an EEPROM write will zero the EEPROM address registers.
Karl Knauf
Datex-Ohmeda
Some boxes I have helped design have made use of non-volatile
memory for storage of fault information. At first, the technology
employed was NVRAM. More recently, we have used EEPROM. As you
noted in your article, the time available after a power-down indication
is often a factor in the design of the non-volatile storage software.
These days, one can get EEPROM having write cycles measured in
microseconds, and lifetimes of more than a million writes. It
has not always been thus. In our early use of EEPROM, we had to
cope with its limited write-cycle capability, and lengthy write
times.
Since our interest is primarily in fault data, we generally write
it only when a fault is detected. Sometimes, there are many of
these computed in a single computation "frame". More often, the
box goes for days, weeks, or longer between faults. In any event,
the required 30+ year service life of the box dictates some provision
be made for limiting the number of writes to a given EEPROM address
when a fault is recorded. This means, for example, that storage
area pointers, or flags such as your "best" flag, must be distributed
over several different addresses. Likewise, operating hour information,
power-up counts, and other frequently-written data are arranged
to occupy several bytes.
As you suggest, each fault record has its own checksum. In fact,
it has several--one for each of the fields. Some fields actually
use "checksums" that permit bit-error correction, though the error
correction capability has never been used as far as I know. (These
Hamming codes are easier to compute than CRCs and you get the
error correction capability for free.) Our "current record" identification
and power-up count fields use a "traveling indicator" scheme.
Our fault memory stores information about the most recent 64
"equipment cycles." A corresponding block of 64 words is set aside
for pointers to the "current equipment cycle" data. As each equipment
cycle pointer is written, a "current record" indicator in the
pointer word is "toggled".
Larry D. Morris
There is a fourth consideration in planning when to store non
volatile data, and that is the lifetime write ability of the hardware.
There are some EEPROM technologies that have unlimited read capabilities
but a limited number of writes such as 100,000 or 1,000,000. Writing
once per minute will render the EEPROM useless in 2 years for
a part with 1,000,000 writes, assuming round the clock operation.
If your hardware uses such a limited write cycle device, some
checking of the expected usage and lifetime of the device and
a little arithmetic is in order when designing the non volatile
scheme.
Thanks for some good columns. You have a knack for summing up
all the issues. Your columns could have saved me some grief if
I'd read them at the beginning of some of my designs.
Nancy Goering
Clarity Visual Systems
Mr. Murphy,
I just read your June column. I think you did a nice job of summarizing
many of the issues and methods involved with storing data in a
non-volatile manner. I have faced many of these myself.
One thing I did not see mentioned is the issue of wearing-out
memory by writing to it too often. Most EEPROM and Flash devices
are limited to 10,000, 100,000, or 1,000,000 write cycles. This
must be considered in the context of the intended life of a product
to determine the maximum allowable update rate. At my company,
we create products that are intended to last for 30 years, so
this is a significant consideration. We've come-up with a handful
of techniques for dealing with data that needs to be stored more
often than the write-cycle limit would allow, primarily be spreading
the storage of such data over multiple memory locations.
I look forward to your next column regarding maintaining non-volatile
data through firmware upgrades. This is another issue I have dealt
with, but have not been completely satisfied with the methods
we used. I hope I will find some new ideas in your article.
Dave Wood
Schweitzer Engineering Laboratories, Inc.
|