www.panelsoft.com

 

 

 

Home

Training

Reading

PanelSoft

User Interfaces and Usability for Embedded Systems


Feedback to '' - Murphy's Law, Mar' 2001

return to Murphy's Law

This column led to a number of readers responding to the way that I handled the problem of inserting octal values in strings, which also contain digits as characters. Todd Litwin, John Langenbach and Daron Smith all made the same point about this, so I will just show Todd's mail here since the others are equivalent, and follow it with my response which appears in the May 2001 column.

In your March 2001 "Murphy's Law" you wrote:

If the character following a happened to be the digit 5, the compiler would see the sequence \045 and interpret that as a single octal value, generating a single byte.

You then go on to propose a somewhat ugly workaround to deal with this. But there is a much easier way. If you had encoded the a-accent-grave in three octal digits (\004) rather than in two, then you wouldn't have had any problem. The C language specifies that any such sequence have only 1-3 octal digits. So \0045 would be interpreted as two bytes, not one.

By the way, I am enjoying your column. Keep up the good work.

Todd Litwin .

Response extracted from the May 2001 column

A number of readers responded to my March 2001 column, which discussed representation of characters from non-ASCII character sets. In that issue I proposed closing and reopening strings in order to ensure that octal escape sequences in strings did not accidentally merge with a following digit. For example we might intend "\415" to be stored with the same two bytes as "!5", since octal 41 is the exclamation mark in the ASCII table. However the C compiler will not interpret this as \41 followed by the character 5. Instead it will try to interpret the octal value 415. The solution I proposed was to break the string, by closing the double quotes and reopening them. The compiler will interpret "\41" "5" as "!5". I am using the explanation mark for the example, but in a real application we would be dealing with unprintable characters, which the programmer's editor could not enter.

It was pointed out to me independently by Daron Smith, Todd Litwin and John Langenbach that by always ensuring that there were three octal digits, I could get the desired result and it would not look as untidy as breaking the string in two. By using leading zeros I can increase the number of digits for the first character to three. So we could use "\0415" to get "!5". This is a far better solution. The simplest rule it to always ensure that your octal character constants are always exactly three digits long.

It is worth pointing out the origin of my mistake. While an octal escape sequence is at most three digits long, a hexadecimal escape sequence will continue until a non-hexadecimal character is found. So if you are using hexadecimal values in your string you will have to use the method described above of closing and reopening the string. This is a rather devious difference between the way that you are allowed to use hexadecimal character constants and octal character constants, so do not let yourself get caught out like I did .


And in response to my response:

Just a quick comment about hex escape sequences mentioned in your Murphy's Law:

While the C99 standard is clear that hex-escape to character translation occurs before the adjacent string tokens are concatenated (I'm not sure about previous C standards), at least one compiler I've used (Hitachi SHC) concatenates before converting the escape sequence. Consequently, "\x12" "3" wasn't good enough and I had to define my string byte by byte: const char foo[] = { '\x12', '3' }; ick! It didn't occur to me to use \0223 instead (I generally avoid octal as hex is so much more familiar to me). Next time, I'll use the octal. Thanks for passing on this suggestion from your readers.

Paul


[PanelSoft Home | Training Courses ]