C和C ++是否保证[a-f]和[A-F]字符的ASCII? [英] Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters?

查看:76
本文介绍了C和C ++是否保证[a-f]和[A-F]字符的ASCII?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看以下代码,以测试十六进制数字并将其转换为整数.该代码之所以巧妙,是因为它利用了大写字母和小写字母之间的差异(32位),即位5.因此,该代码执行了一个额外的OR,但是节省了一个JMP和两个CMP.

I'm looking at the following code to test for a hexadecimal digit and convert it to an integer. The code is kind of clever in that it takes advantage of difference between between capital and lower letters is 32, and that's bit 5. So the code performs one extra OR, but saves one JMP and two CMPs.

static const int BIT_FIVE = (1 << 5);
static const char str[] = "0123456789ABCDEFabcdef";

for (unsigned int i = 0; i < COUNTOF(str); i++)
{
    int digit, ch = str[i];

    if (ch >= '0' && ch <= '9')
        digit = ch - '0';
    else if ((ch |= BIT_FIVE) >= 'a' && ch <= 'f')
        digit = ch - 'a' + 10;
    ...
}

C和C ++是否保证ASCII或[a-f]和[A-F]字符的值?在这里,保证意味着上,下字符集将始终以可以用位表示的常量值不同(对于上述技巧).如果没有,标准对他们有什么看法?

Do C and C++ guarantee the ASCII or values of [a-f] and [A-F] characters? Here, guarantee means the upper and lower character sets will always differ by a constant value that can be represented by a bit (for the trick above). If not, what does the standard say about them?

(对不起,C和C ++标记.我对两种语言在主题上的立场都感兴趣).

(Sorry for the C and C++ tag. I'm interested in both language's position on the subject).

推荐答案

没有关于特定值的保证,但您不必担心, ,因为您的软件可能永远不会遇到这样的系统:以这种方式与ASCII不兼容.假设空间始终为32,而A始终为65,这在现代世界中可以正常使用.

There are no guarantees about the particular values but you shouldn't care, because your software will probably never encounter a system which is not compatible in this way with ASCII. Assume that space is always 32 and that A is always 65, this works fine in the modern world.

C标准仅保证字母A-Z和a-z存在并且它们适合单个字节.

The C standard only guarantees that letters A-Z and a-z exist and that they fit within a single byte.

它确实确保0-9是连续的.

It does guarantee that 0-9 are sequential.

在源和执行基本字符集中, 上面的十进制数字列表中的每个0后面的字符的值应大于1 上一个的值.

In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.

正当化

世界上有很多字符编码.如果您关心可移植性,则可以使程序可移植到不同的字符集,也可以选择一种字符集以在所有地方使用(例如Unicode).我将继续为您大致分类大多数现有的字符编码:

Justification

There are a lot of character encodings out in the world. If you care about portability, you can either make your program portable to different character sets, or you can choose one character set to use everywhere (e.g. Unicode). I'll go ahead and loosely categorize most existing character encodings for you:

  1. 与ISO/IEC 646兼容的单字节字符编码.数字0-9和字母A-Z和a-z始终位于相同的位置.

  1. Single byte character encodings compatible with ISO/IEC 646. Digits 0-9 and letters A-Z and a-z always occupy the same positions.

多字节字符编码(Big5,Shift JIS,基于ISO 2022).在这些编码中,您的程序可能已经已损坏,并且您需要花时间修复它.但是,解析数字仍然可以按预期进行.

Multibyte character encodings (Big5, Shift JIS, ISO 2022-based). In these encodings, your program is probably already broken and you'll need to spend time fixing it if you care. However, parsing numbers will still work as expected.

Unicode编码.数字0-9和字母A-Z,a-z始终占据相同的位置.您可以自由地使用代码点或代码单元,如果您使用的代码点数小于128(您自己),则将获得相同的结果. (您使用的是UTF-7吗?否,您只能将其用于电子邮件.

Unicode encodings. Digits 0-9 and letters A-Z, a-z always occupy the same positions. You can either work with code points or code units freely and you will get the same result, if you are working with code points below 128 (which you are). (Are you working with UTF-7? No, you should only use that for email.

EBCDIC.为数字和字母分配的值与其在ASCII中的值不同,但是0-9和A-F,a-f仍然是连续的.即使这样,您的代码也可以在EBCDIC系统上运行的机会基本上为零.

EBCDIC. Digits and letters are assigned different values than their values in ASCII, however, 0-9 and A-F, a-f are still contiguous. Even then, the chance that your code will run on an EBCDIC system is essentially zero.

所以这里的问题是:您是否认为将来会发明一个假设的第五种选择,与Unicode相比,它以某种方式不那么兼容/更难使用?

So the question here is: Do you think that a hypothetical fifth option will be invented in the future, somehow less compatible / more difficult to use than Unicode?

您在乎EBCDIC吗?

Do you care about EBCDIC?

我们可能整天都在幻想奇异的系统...假设CHAR_BIT是11或sizeof(long) = 100,或者假设我们使用补码算术,或者malloc()总是返回NULL,或者假设您的像素显示器排列在六边形网格中.假设您的浮点数不是IEEE 754,并且所有数据指针的大小都不同.归根结底,这并不能使我们更接近在实际的现代系统上编写工作软件的目标(有偶然例外).

We could dream up bizarre systems all day... suppose CHAR_BIT is 11, or sizeof(long) = 100, or suppose we use one's complement arithmetic, or malloc() always returns NULL, or suppose the pixels on your monitor are arranged in a hexagonal grid. Suppose your floating-point numbers aren't IEEE 754, suppose all of your data pointers are different sizes. At the end of the day, this does not get us closer to our goals of writing working software on actual modern systems (with the occasional exception).

这篇关于C和C ++是否保证[a-f]和[A-F]字符的ASCII?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆