Feedback to ''
- Murphy's Law, Mar' 2001
return to Murphy's Law
This column led to a number of readers responding to the way
that I handled the problem of inserting octal values in strings,
which also contain digits as characters. Todd Litwin, John Langenbach
and Daron Smith all made the same point about this, so I will
just show Todd's mail here since the others are equivalent, and
follow it with my response which appears in the May 2001 column.
In your March 2001 "Murphy's Law" you wrote:
If the character following a happened to be the digit 5, the
compiler would see the sequence \045 and interpret that as a single
octal value, generating a single byte.
You then go on to propose a somewhat ugly workaround to deal
with this. But there is a much easier way. If you had encoded
the a-accent-grave in three octal digits (\004) rather than in
two, then you wouldn't have had any problem. The C language specifies
that any such sequence have only 1-3 octal digits. So \0045 would
be interpreted as two bytes, not one.
By the way, I am enjoying your column. Keep up the good work.
Todd Litwin .
Response extracted from the May 2001 column
A number of readers responded to my March 2001 column, which
discussed representation of characters from non-ASCII character
sets. In that issue I proposed closing and reopening strings in
order to ensure that octal escape sequences in strings did not
accidentally merge with a following digit. For example we might
intend "\415" to be stored with the same two bytes as "!5", since
octal 41 is the exclamation mark in the ASCII table. However the
C compiler will not interpret this as \41 followed by the character
5. Instead it will try to interpret the octal value 415. The solution
I proposed was to break the string, by closing the double quotes
and reopening them. The compiler will interpret "\41" "5" as "!5".
I am using the explanation mark for the example, but in a real
application we would be dealing with unprintable characters, which
the programmer's editor could not enter.
It was pointed out to me independently by Daron Smith, Todd
Litwin and John Langenbach that by always ensuring that there
were three octal digits, I could get the desired result and it
would not look as untidy as breaking the string in two. By using
leading zeros I can increase the number of digits for the first
character to three. So we could use "\0415" to get "!5". This
is a far better solution. The simplest rule it to always ensure
that your octal character constants are always exactly three digits
long.
It is worth pointing out the origin of my mistake. While an
octal escape sequence is at most three digits long, a hexadecimal
escape sequence will continue until a non-hexadecimal character
is found. So if you are using hexadecimal values in your string
you will have to use the method described above of closing and
reopening the string. This is a rather devious difference between
the way that you are allowed to use hexadecimal character constants
and octal character constants, so do not let yourself get caught
out like I did .
And in response to my response:
Just a quick comment about hex escape sequences mentioned in
your Murphy's Law:
While the C99 standard is clear that hex-escape to character
translation occurs before the adjacent string tokens are concatenated
(I'm not sure about previous C standards), at least one compiler
I've used (Hitachi SHC) concatenates before converting the escape
sequence. Consequently, "\x12" "3" wasn't good enough and I had
to define my string byte by byte: const char foo[] = { '\x12',
'3' }; ick! It didn't occur to me to use \0223 instead (I generally
avoid octal as hex is so much more familiar to me). Next time,
I'll use the octal. Thanks for passing on this suggestion from
your readers.
Paul
|