为什么减字符实现的行为是特定的? [英] Why is the behaviour of subtracting characters implementation specific?

查看:74
本文介绍了为什么减字符实现的行为是特定的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此语句:

if('z' - 'a' == 25)

不能保证以相同的方式进行评估。它取决于编译器。另外,不能保证以与以下相同的方式进行求值:

is not guaranteed to evaluate in the same way. It is compiler dependent. Also, it is not guaranteed to be evaluated in the same way as this:

#if 'z' - 'a' == 25

即使预处理器和编译器都在同一台计算机上运行。为什么会这样?

even if both the preprocessor and compiler are run on the same machine. Why is that?

推荐答案

OP正在询问标准的直接引号- N1570§6.10.1p3,4+脚注168

The OP is asking about a direct quote from the standard — N1570 §6.10.1p3,4 + footnote 168:


...根据6.6的规则求值。 ...这包括解释字符常量,这可能涉及将转义序列转换为执行字符集成员。这些字符常量的数值是否与表达式中出现相同字符常量(而不是#if或#elif指令中的字符常量)时获得的值实现定义。 168

[脚注168]因此,不能保证以下#if指令和if语句中的常量表达式在这两个上下文中的值都相同。

[footnote 168] Thus, the constant expression in the following #if directive and if statement is not guaranteed to evaluate to the same value in these two contexts.

#if 'z' - 'a' == 25
if ('z' - 'a' == 25)


所以,是的,确实不能保证。

So, yes, it really isn't guaranteed.

要理解为什么不能保证,首先您需要知道C标准不需要字符常量'a''z '具有通过ASCII分配给这些字符的数值。如今,大多数 C实现都使用ASCII或超集,但还有另一种称为 EBCDIC 仍然被广泛使用(仅在IBM大型机上使用,但是仍然有很多)。在EBCDIC中,'a''z'不仅具有与ASCII不同的值,而且字母也不连续的序列!这就是为什么表达式'z'-'a'== 25 可能一开始就无法得出正确的结果。

To understand why it isn't guaranteed, first you need to know that the C standard doesn't require the character constants 'a' and 'z' to have the numeric values assigned to those characters by ASCII. Most C implementations nowadays use ASCII or a superset, but there is another encoding called EBCDIC that is still widely used (only on IBM mainframes, but there are still a lot of those out there). In EBCDIC, not only do 'a' and 'z' have different values from ASCII, the alphabet isn't a contiguous sequence! That's why the expression 'z' - 'a' == 25 might not evaluate true in the first place.

您还需要知道C标准试图在用于源代码的文本编码(源字符集)和程序将在运行时使用的文本编码(执行字符集)之间保持区别。这样,您至少可以在原则上选择一个程序,该程序的源代码以ASCII文本编码,并通过适当的交叉编译在使用EBCDIC的计算机上未经修改地运行它;您不必先将源文本转换为EBCDIC。

You also need to know that the C standard tries to maintain a distinction between the text encoding used for source code (the "source character set") and the text encoding that the program will use at runtime (the "execution character set"). This is so you can, at least in principle, take a program whose source encoded in ASCII text and run it unmodified on a computer that uses EBCDIC, just by cross-compiling appropriately; you don't have to convert the source text to EBCDIC first.

现在,编译器必须理解这两个字符集是否不同,但是从历史上看,它们是C预处理程序(翻译阶段 1到4)和适当的编译器 (第5阶段到第7阶段)是两个单独的程序, #if 表达式是预处理器唯一必须了解执行字符集的地方。因此,通过使其实现定义,可以确定执行字符集是否被执行。预处理程序使用的字符与编译器本身使用的字符匹配,该标准许可预处理程序以 source 字符集进行所有工作,从而使生活早在1989年就容易了。

Now, the compiler has to understand both character sets if they're different, but historically, the C preprocessor (translation phases 1 through 4) and the "compiler proper" (phases 5 through 7) were two separate programs, and #if expressions are the only place where the preprocessor would have to know about the execution character set. So, by making it implementation-defined whether the "execution character set" used by the preprocessor matches that used by the compiler proper, the standard licenses the preprocessor to do all its work in the source character set, making life a little bit easier back in 1989.

说了这么多,我很惊讶地发现,即使执行和源字符集完全不兼容,也无法使两个表达式求值相同的现代编译器。现代编译器往往具有 integrated 预处理器-第1到第7阶段都是由同一程序执行的-即使不这样做,专用于预处理器以匹配其执行特性的工程负担如今,将其设置为正确的编译器是不重要的。

Having said all that, I would be very surprised to find a modern compiler that didn't make both expressions evaluate to the same value, even when the execution and source character sets are grossly incompatible. Modern compilers tend to have integrated preprocessors -- phases 1 through 7 are all carried out by the same program -- and even if they don't, the engineering burden of specializing the preprocessor to match its execution character set to the compiler proper is trivial nowadays.

这篇关于为什么减字符实现的行为是特定的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆