wint_t是否总是至少与wchar_t一样大?以及unsigned short如何满足wint_t的要求? [英] Is wint_t always at least as large as wchar_t? And how can unsigned short satisfy reqirements of wint_t?

查看:96
本文介绍了wint_t是否总是至少与wchar_t一样大?以及unsigned short如何满足wint_t的要求?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎每个人都认为 wint_t 至少与 wchar_t 一样大.但是,C标准允许 wchar_t 范围的值不直接与扩展字符集中的任何字符相对应:

It seems everyone assumes wint_t is at least as large as wchar_t. However C standard allows wchar_t range to have value that do not directly correspond to any character in extended character set:

WCHAR_MIN WCHAR_MAX 不一定与扩展字符集的成员相对应.

The values WCHAR_MIN and WCHAR_MAX do not necessarily correspond to members of the extended character set.

和:

wchar_t ,它是一种整数类型,其值范围可以代表受支持的语言环境中指定的最大扩展字符集的所有成员的不同代码;空字符的代码值为零.

wchar_t , which is an integer type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales; the null character shall have the code value zero.

wint_t 必须仅能容纳扩展字符集成员的值,并且至少能容纳 WEOF 的一个附加值:

and wint_t is required to be able to hold only values for members of extended character set and at least one additional value for WEOF:

wint_t ,这是一种默认情况下未更改的整数类型,参数promotions可以保存与扩展字符集的成员相对应的任何值,以及至少一个不与任何成员相对应的值扩展字符集的名称(请参见下面的 WEOF );

wint_t , which is an integer type unchanged by default argument promotions that can hold any value corresponding to members of the extended character set, as well as at least one value that does not correspond to any member of the extended character set (see WEOF below);

wint_t 的默认情况下参数提升未更改的要求也并不意味着 wint_t 大于 wchar_t ,因为 wchar_t 可能也足够大,默认情况下参数提升不能更改.

The requirement of wint_t being unchanged by default argument promotions also does not imply wint_t is larger than wchar_t, as wchar_t may also be large enough to be unchanged by default argument promotions.

因此,在某些虚构的实现中, wchar_t 可能定义得足够大,以容纳许多不在扩展字符集中的不必要的值,并绕过默认的参数提升.然后,此实现可以选择不将那些值包括在 wint_t 中.这允许 wchar_t 大于 wint_t .

So in some imaginary implementation, wchar_t might be defined large enough to hold many unnecessary values that are not in extended character set and also to bypass default argument promotions. This implementation then may choose to not to include those values in wint_t. This allows wchar_t to be larger than wint_t.

根据 wchar_t 的标准大小,必须至少1个字节,而 wint_t 至少2个字节.(假设为8位字节)

According to standard size of wchar_t must be at least 1 byte and wint_t at least 2 byte. (assuming 8-bit bytes)

在Microsoft Visual Studio中, wint_t 也被 typedef 设置为 unsigned short .这如何满足默认参数提升不更改的要求?我认为C允许2个字节的 wint_t ,因为 int 在某些实现中可能是2个字节.

Also in Microsoft Visual Studio wint_t is typedefed to unsigned short. How this satisfies requirement of being unchanged by default argument promotions? I thought C allows 2-byte wint_t because int may be 2 byte in some implementation.

推荐答案

wint_t wchar_t int char ,因此一种实现,其中 sizeof(wchar_t)== sizeof(wint_t)完全合法,就像

wint_t to wchar_t is the same as what int to char, therefore an implementation where sizeof(wchar_t) == sizeof(wint_t) is completely legal, just as implementations where sizeof(int) == sizeof(char) are allowed. In fact for the char case it's even worse because you can't return a different type for getc, fgetc... whereas for wint_t you can simply typedef it as a wider type if necessary. You can also see that the standard even explicitly permits it

脚注327) wchar_t wint_t 可以是相同的整数类型.

Footnote 327) wchar_t and wint_t can be the same integer type.

http://www.iso-9899.info/n1570.html#7.29.1

该标准还说,"值WCHAR_MIN和WCHAR_MAX不一定对应于扩展字符集的成员".并没有错.扩展的字符集范围可以小于 wchar_t 范围,因为在 char 中也是如此.例如,如果基本字符集是ASCII,则它仅使用可用范围的一半(如果 CHAR_BIT> 8 ,则少得多). wint_t

The standard also said that "The values WCHAR_MIN and WCHAR_MAX do not necessarily correspond to members of the extended character set" and there's nothing wrong with that. The extended character set range can be smaller than wchar_t range because the same happens in char. For example if the basic character set is ASCII then it uses only half of the available range (or much less if CHAR_BIT > 8). wint_t is

...默认情况下未更改的整数类型,参数提升可以保存与扩展字符集的成员相对应的任何值,以及至少一个与扩展字符集的任何成员不对应的值(请参阅WEOF)下面);

... an integer type unchanged by default argument promotions that can hold any value corresponding to members of the extended character set, as well as at least one value that does not correspond to any member of the extended character set (see WEOF below);

http://www.iso-9899.info/n1570.html#6.3.1.3

因此,如果扩展字符集远小于 wchar_t 集,则其大小可能甚至小于 wchar_t .由于保证0xFFFF根本不是Unicode字符,因此将它完全用于 WEOF 有效,虽然恕我直言有点怪异,但我不知道为什么MS会这么做

so presumably its size may be even smaller than wchar_t if the extended character set is much smaller than the wchar_t set. Since 0xFFFF is guaranteed not to be a Unicode character at all, using it for WEOF is completely valid, although it's a little bit weird IMHO and I don't know why MS did that

如果 sizeof(wchar_t)== sizeof(wint_t) sizeof(int)== sizeof(char),那么还有 char wchar_t 可以表示,但 int wint_t 不能代表 char / wchar_t是未签名的.在这种情况下,它们之间的转换是实现定义的.如果您正在处理文本文件,那不会有任何问题,尽管在读取二进制文件时会引起问题.无论如何,为了获得可移植性,您都需要明确测试EOF并自行出错

If sizeof(wchar_t) == sizeof(wint_t) or sizeof(int) == sizeof(char) then there are also values that char and wchar_t can represent but int and wint_t can't in case char/wchar_t is unsigned. In that case the conversion between them is implementation defined. That won't be any issues if you're working on text files although it'll cause problems if you're reading binary files. Anyway in that case for portability you need to explicitly test for EOF and error yourself

int c;
while((c = /* fgetwc(in) */ fgetc(in)) != EOF || (!feof(in) && !ferror(in)))
    fputc(c, out);

这与 TI的建议

sizeof(char)== sizeof(int)(C2700,C2800,C5400,C5500)的目标上,您仍然无法可靠地使用 的返回值getc() 来检查文件结尾,因为0xffff将被误认为文件结尾.改用 feof().

On targets where sizeof(char)==sizeof(int) (C2700, C2800, C5400, C5500), you still can't reliably use the return value of getc() to check for end of file, because 0xffff will be mistaken for the end of file. Use feof() instead.

CMU FIO34-C.区分从文件读取的字符和 EOF WEOF

由于 EOF 为负,因此它不应与任何未签名的字符值匹配.但是,这仅对实现,其中 int 类型比 char 宽.在 int char 具有相同宽度的实现中,字符读取功能可以读取和返回有效字符,该字符具有与 EOF相同的位模式.例如,如果攻击者在文件或数据流中插入了一个看起来像EOF的值来更改程序的行为,就会发生这种情况.

Because EOF is negative, it should not match any unsigned character value. However, this is only true for implementations where the int type is wider than char. On an implementation where int and char have the same width, a character-reading function can read and return a valid character that has the same bit-pattern as EOF. This could occur, for example, if an attacker inserted a value that looked like EOF into the file or data stream to alter the behavior of the program.

C标准仅要求int类型能够表示+32767的最大值,并且char类型不得大于int.尽管不常见,但这种情况可能导致整数常量表达式EOF与有效字符无法区分;即(int)(unsigned char)65535 == -1 .因此,如果无法使用 feof() ferror()检测文件结尾和文件错误,则可能导致在> sizeof(int)== sizeof(char).

The C Standard requires only that the int type be able to represent a maximum value of +32767 and that a char type be no larger than an int. Although uncommon, this situation can result in the integer constant expression EOF being indistinguishable from a valid character; that is, (int)(unsigned char)65535 == -1. Consequently, failing to use feof() and ferror() to detect end-of-file and file errors can result in incorrectly identifying the EOF character on rare implementations where sizeof(int) == sizeof(char).

在读取宽字符时,此问题更为常见. fgetwc(), getwc() getwchar()函数返回类型为 wint_t 的值.该值可以表示下一个读取的宽字符,也可以表示 WEOF ,它表示宽字符流的文件结尾.在大多数实现中, wchar_t 类型的宽度与 wint_t 相同,并且这些函数可以返回与 WEOF 不能区分的字符.

This problem is much more common when reading wide characters. The fgetwc(), getwc(), and getwchar() functions return a value of type wint_t. This value can represent the next wide character read, or it can represent WEOF, which indicates end-of-file for wide character streams. On most implementations, the wchar_t type has the same width as wint_t, and these functions can return a character indistinguishable from WEOF.

在UTF-16字符集中,保证0xFFFF不是字符,这允许将 WEOF 表示为值-1.同样,当视为带符号的32位整数时,所有UTF-32字符均为正.所有广泛使用的字符集均设计有至少一个不代表字符的值.因此,将需要一个不考虑C编程语言而设计的自定义字符集,以使此问题出现在宽字符或宽至 int 的普通字符中.

In the UTF-16 character set, 0xFFFF is guaranteed not to be a character, which allows WEOF to be represented as the value -1. Similarly, all UTF-32 characters are positive when viewed as a signed 32-bit integer. All widely used character sets are designed with at least one value that does not represent a character. Consequently, it would require a custom character set designed without consideration of the C programming language for this problem to occur with wide characters or with ordinary characters that are as wide as int.

另请参见

  • Might an unsigned char be equal to EOF?
  • Can sizeof(int) ever be 1 on a hosted implementation?
  • Can an implementation that has sizeof (int) == 1 "fully conform"?
  • ctype.h and sizeof(int) == sizeof(char)

这篇关于wint_t是否总是至少与wchar_t一样大?以及unsigned short如何满足wint_t的要求?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆