为什么作为符号或无符号而不是C或C ++标准明确定义字符? [英] Why don't the C or C++ standards explicitly define char as signed or unsigned?

查看:184
本文介绍了为什么作为符号或无符号而不是C或C ++标准明确定义字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

int main()
{
    char c = 0xff;
    bool b = 0xff == c;
    // Under most C/C++ compilers' default options, b is FALSE!!!
}

无论是C或C ++标准指定字符作为符号或符号,它是实现定义的。

Neither the C or C++ standard specify char as signed or unsigned, it is implementation-defined.

为什么C / C ++标准没有明确的符号或无符号,以避免像上面code危险滥用定义字符?

Why does the C/C++ standard not explicitly define char as signed or unsigned for avoiding dangerous misuses like the above code?

推荐答案

历史的原因,大多是。

防爆pressions 字符在大多数情况下晋升为<​​code> INT (因为有很多的CPU不具有的8位算术运算)。在某些系统中,符号扩展是做到这一点的最有效方法,它主张使平原字符签署。

Expressions of type char are promoted to int in most contexts (because a lot of CPUs don't have 8-bit arithmetic operations). On some systems, sign extension is the most efficient way to do this, which argues for making plain char signed.

在另一方面,EBCDIC字符集有与高阶位设置(即字符128或更高的值)基本特征;在EBCDIC平台上,字符 pretty多有无符号。

On the other hand, the EBCDIC character set has basic characters with the high-order bit set (i.e., characters with values of 128 or greater); on EBCDIC platforms, char pretty much has to be unsigned.

借助 ANSI C的理由(为1989年标准)没有很多说关于这个问题;部分3.1.2.5说:

The ANSI C Rationale (for the 1989 standard) doesn't have a lot to say on the subject; section 3.1.2.5 says:

三种类型的字符指定:签署,平原,无符号。一个
  平原字符可重新psented为$ P $符号或者无符号的,这取决于
  后的实施,如在现有实践。类型符号字符
  引入使可用一个字节有符号整数键入
  那些实施char前为无符号系统。对于原因
  对称,关键字签署是允许的类型名称的一部分
  其他整型。

Three types of char are specified: signed, plain, and unsigned. A plain char may be represented as either signed or unsigned, depending upon the implementation, as in prior practice. The type signed char was introduced to make available a one-byte signed integer type on those systems which implement plain char as unsigned. For reasons of symmetry, the keyword signed is allowed as part of the type name of other integral types.

让我们回到更进一步,在 C参考手册的一个早期版本从1975年说:

Going back even further, an early version of the C Reference Manual from 1975 says:

A 字符对象可以在任何地方使用的 INT 的可能。在所有情况下
  字符是通过上传播其符号转换为 INT
  所得整数的8位。这是与两个的一致
  补充用于字符和整数重新presentation。
  (但是,在登录的传播特性在其他消失
  实现。)

A char object may be used anywhere an int may be. In all cases the char is converted to an int by propagating its sign through the upper 8 bits of the resultant integer. This is consistent with the two’s complement representation used for both characters and integers. (However, the sign-propagation feature disappears in other implementations.)

这说明更多实现特定的比我们在以后的文件看,但它承认,字符可以是带符号。在其他实现上符号传播消失,推广一个字符对象来 INT 将具有零扩展的8位重新presentation,基本上把它当作一个8位的无符号的数量。 (语言还不具备签署无符号关键字)。

This description is more implementation-specific than what we see in later documents, but it does acknowledge that char may be either signed or unsigned. On the "other implementations" on which "the sign-propagation disappears", the promotion of a char object to int would have zero-extended the 8-bit representation, essentially treating it as an 8-bit unsigned quantity. (The language didn't yet have the signed or unsigned keyword.)

C'S立即predecessor一个叫做硼。硼的语言是无类型语言,因此字符的问题被符号或无符号并不适用。有关C的早期历史的更多信息,请参见后期丹尼斯里奇的主页

C's immediate predecessor was a language called B. B was a typeless language, so the question of char being signed or unsigned did not apply. For more information about the early history of C, see the late Dennis Ritchie's home page.

至于发生了什么事在code(运用现代的C规则):

As for what's happening in your code (applying modern C rules):

char c = 0xff;
bool b = 0xff == c;

如果纯字符未签名,那么的ç初始化其设置为(字符)0xFF的,其中比较等于 0xFF的在第二行。但是,如果纯字符签署,那么 0xFF的(类型的前pression INT )转换为字符 - 但因为 0xFF的超过CHAR_MAX(假设 CHAR_BIT == 8 ),其结果是实现定义的。在大多数实现中,其结果是 1 。在比较 0xFF的== c审核,两个操作数都转换为 INT ,使其等同于 0xFF的== -1 255 == -1 ,这当然是错误的。

If plain char is unsigned, then the initialization of c sets it to (char)0xff, which compares equal to 0xff in the second line. But if plain char is signed, then 0xff (an expression of type int) is converted to char -- but since 0xff exceeds CHAR_MAX (assuming CHAR_BIT==8), the result is implementation-defined. In most implementations, the result is -1. In the comparison 0xff == c, both operands are converted to int, making it equivalent to 0xff == -1, or 255 == -1, which is of course false.

要注意的另一个重要的事情是, unsigned char型符号字符,和(平)字符三种不同的类型。 字符有同样的重presentation为的或者 unsigned char型符号字符;它的实现定义它是哪一个。 (在另一方面,符号int INT 两个名字相同的类型; unsigned int类型是一个独特的类型。(除,只是添加到轻浮,这是实现定义是否宣布为普通位域 INT 带符号。))

Another important thing to note is that unsigned char, signed char, and (plain) char are three distinct types. char has the same representation as either unsigned char or signed char; it's implementation-defined which one it is. (On the other hand, signed int and int are two names for the same type; unsigned int is a distinct type. (Except that, just to add to the frivolity, it's implementation-defined whether a bit field declared as plain int is signed or unsigned.))

是的,这一切都有点乱,我敢肯定,这将如果C正在从头开始设计今天已经作出不同的定义。但是C语言的各修订不得不避免破坏(太多)现有code和在较小程度上现有的实现

Yes, it's all a bit of a mess, and I'm sure it would have be defined differently if C were being designed from scratch today. But each revision of the C language has had to avoid breaking (too much) existing code, and to a lesser extent existing implementations.

这篇关于为什么作为符号或无符号而不是C或C ++标准明确定义字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆