wint_t是否总是至少与wchar_t一样大?以及unsigned short如何满足wint_t的要求? [英] Is wint_t always at least as large as wchar_t? And how can unsigned short satisfy reqirements of wint_t?
问题描述
似乎每个人都认为 wint_t
至少与 wchar_t
一样大.但是,C标准允许 wchar_t
范围的值不直接与扩展字符集中的任何字符相对应:
It seems everyone assumes wint_t
is at least as large as wchar_t
. However C standard allows wchar_t
range to have value that do not directly correspond to any character in extended character set:
值
WCHAR_MIN
和WCHAR_MAX
不一定与扩展字符集的成员相对应.
The values
WCHAR_MIN
andWCHAR_MAX
do not necessarily correspond to members of the extended character set.
和:
wchar_t
,它是一种整数类型,其值范围可以代表受支持的语言环境中指定的最大扩展字符集的所有成员的不同代码;空字符的代码值为零.
wchar_t
, which is an integer type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales; the null character shall have the code value zero.
和 wint_t
必须仅能容纳扩展字符集成员的值,并且至少能容纳 WEOF
的一个附加值:
and wint_t
is required to be able to hold only values for members of extended character set and at least one additional value for WEOF
:
wint_t
,这是一种默认情况下未更改的整数类型,参数promotions可以保存与扩展字符集的成员相对应的任何值,以及至少一个不与任何成员相对应的值扩展字符集的名称(请参见下面的WEOF
);
wint_t
, which is an integer type unchanged by default argument promotions that can hold any value corresponding to members of the extended character set, as well as at least one value that does not correspond to any member of the extended character set (seeWEOF
below);
wint_t
的默认情况下参数提升未更改的要求也并不意味着 wint_t
大于 wchar_t
,因为 wchar_t
可能也足够大,默认情况下参数提升不能更改.
The requirement of wint_t
being unchanged by default argument promotions also does not imply wint_t
is larger than wchar_t
, as wchar_t
may also be large enough to be unchanged by default argument promotions.
因此,在某些虚构的实现中, wchar_t
可能定义得足够大,以容纳许多不在扩展字符集中的不必要的值,并绕过默认的参数提升.然后,此实现可以选择不将那些值包括在 wint_t
中.这允许 wchar_t
大于 wint_t
.
So in some imaginary implementation, wchar_t
might be defined large enough to hold many unnecessary values that are not in extended character set and also to bypass default argument promotions. This implementation then may choose to not to include those values in wint_t
. This allows wchar_t
to be larger than wint_t
.
根据 wchar_t
的标准大小,必须至少1个字节,而 wint_t
至少2个字节.(假设为8位字节)
According to standard size of wchar_t
must be at least 1 byte and wint_t
at least 2 byte. (assuming 8-bit bytes)
在Microsoft Visual Studio中, wint_t
也被 typedef
设置为 unsigned short
.这如何满足默认参数提升不更改的要求?我认为C允许2个字节的 wint_t
,因为 int
在某些实现中可能是2个字节.
Also in Microsoft Visual Studio wint_t
is typedef
ed to unsigned short
. How this satisfies requirement of being unchanged by default argument promotions? I thought C allows 2-byte wint_t
because int
may be 2 byte in some implementation.
推荐答案
wint_t
到 wchar_t
与 int
到 char
,因此一种实现,其中 sizeof(wchar_t)== sizeof(wint_t)
完全合法,就像
wint_t
to wchar_t
is the same as what int
to char
, therefore an implementation where sizeof(wchar_t) == sizeof(wint_t)
is completely legal, just as implementations where sizeof(int) == sizeof(char)
are allowed. In fact for the char
case it's even worse because you can't return a different type for getc
, fgetc
... whereas for wint_t
you can simply typedef it as a wider type if necessary. You can also see that the standard even explicitly permits it
脚注327)
wchar_t
和wint_t
可以是相同的整数类型.
Footnote 327)
wchar_t
andwint_t
can be the same integer type.
http://www.iso-9899.info/n1570.html#7.29.1
该标准还说,"值WCHAR_MIN和WCHAR_MAX不一定对应于扩展字符集的成员".并没有错.扩展的字符集范围可以小于 wchar_t
范围,因为在 char
中也是如此.例如,如果基本字符集是ASCII,则它仅使用可用范围的一半(如果 CHAR_BIT> 8
,则少得多). wint_t
是
The standard also said that "The values WCHAR_MIN and WCHAR_MAX do not necessarily correspond to members of the extended character set" and there's nothing wrong with that. The extended character set range can be smaller than wchar_t
range because the same happens in char
. For example if the basic character set is ASCII then it uses only half of the available range (or much less if CHAR_BIT > 8
). wint_t
is
...默认情况下未更改的整数类型,参数提升可以保存与扩展字符集的成员相对应的任何值,以及至少一个与扩展字符集的任何成员不对应的值(请参阅WEOF)下面);
... an integer type unchanged by default argument promotions that can hold any value corresponding to members of the extended character set, as well as at least one value that does not correspond to any member of the extended character set (see WEOF below);
http://www.iso-9899.info/n1570.html#6.3.1.3
因此,如果扩展字符集远小于 wchar_t
集,则其大小可能甚至小于 wchar_t
.由于保证0xFFFF根本不是Unicode字符,因此将它完全用于 WEOF
有效,虽然恕我直言有点怪异,但我不知道为什么MS会这么做
so presumably its size may be even smaller than wchar_t
if the extended character set is much smaller than the wchar_t
set. Since 0xFFFF is guaranteed not to be a Unicode character at all, using it for WEOF
is completely valid, although it's a little bit weird IMHO and I don't know why MS did that
如果 sizeof(wchar_t)== sizeof(wint_t)
或 sizeof(int)== sizeof(char)
,那么还有 char
和 wchar_t
可以表示,但 int
和 wint_t
不能代表 char
/ wchar_t
是未签名的.在这种情况下,它们之间的转换是实现定义的.如果您正在处理文本文件,那不会有任何问题,尽管在读取二进制文件时会引起问题.无论如何,为了获得可移植性,您都需要明确测试EOF并自行出错
If sizeof(wchar_t) == sizeof(wint_t)
or sizeof(int) == sizeof(char)
then there are also values that char
and wchar_t
can represent but int
and wint_t
can't in case char
/wchar_t
is unsigned. In that case the conversion between them is implementation defined. That won't be any issues if you're working on text files although it'll cause problems if you're reading binary files. Anyway in that case for portability you need to explicitly test for EOF and error yourself
int c;
while((c = /* fgetwc(in) */ fgetc(in)) != EOF || (!feof(in) && !ferror(in)))
fputc(c, out);
这与 TI的建议
在
sizeof(char)== sizeof(int)
(C2700,C2800,C5400,C5500)的目标上,您仍然无法可靠地使用的返回值getc()
来检查文件结尾,因为0xffff将被误认为文件结尾.改用feof()
.
On targets where
sizeof(char)==sizeof(int)
(C2700, C2800, C5400, C5500), you still can't reliably use the return value ofgetc()
to check for end of file, because 0xffff will be mistaken for the end of file. Usefeof()
instead.
CMU 的 FIO34-C.区分从文件读取的字符和 EOF
或 WEOF
的
由于
EOF
为负,因此它不应与任何未签名的字符值匹配.但是,这仅对实现,其中int
类型比char
宽.在int
和char
具有相同宽度的实现中,字符读取功能可以读取和返回有效字符,该字符具有与EOF相同的位模式
.例如,如果攻击者在文件或数据流中插入了一个看起来像EOF的值来更改程序的行为,就会发生这种情况.
Because
EOF
is negative, it should not match any unsigned character value. However, this is only true for implementations where theint
type is wider thanchar
. On an implementation whereint
andchar
have the same width, a character-reading function can read and return a valid character that has the same bit-pattern asEOF
. This could occur, for example, if an attacker inserted a value that looked like EOF into the file or data stream to alter the behavior of the program.
C标准仅要求int类型能够表示+32767的最大值,并且char类型不得大于int.尽管不常见,但这种情况可能导致整数常量表达式EOF与有效字符无法区分;即(int)(unsigned char)65535 == -1
.因此,如果无法使用 feof()
和 ferror()
检测文件结尾和文件错误,则可能导致在> sizeof(int)== sizeof(char)
.
The C Standard requires only that the int type be able to represent a maximum value of +32767 and that a char type be no larger than an int. Although uncommon, this situation can result in the integer constant expression EOF being indistinguishable from a valid character; that is, (int)(unsigned char)65535 == -1
. Consequently, failing to use feof()
and ferror()
to detect end-of-file and file errors can result in incorrectly identifying the EOF character on rare implementations where sizeof(int) == sizeof(char)
.
在读取宽字符时,此问题更为常见. fgetwc(
), getwc()
和 getwchar()
函数返回类型为 wint_t
的值.该值可以表示下一个读取的宽字符,也可以表示 WEOF
,它表示宽字符流的文件结尾.在大多数实现中, wchar_t
类型的宽度与 wint_t
相同,并且这些函数可以返回与 WEOF
不能区分的字符.
This problem is much more common when reading wide characters. The fgetwc(
), getwc()
, and getwchar()
functions return a value of type wint_t
. This value can represent the next wide character read, or it can represent WEOF
, which indicates end-of-file for wide character streams. On most implementations, the wchar_t
type has the same width as wint_t
, and these functions can return a character indistinguishable from WEOF
.
在UTF-16字符集中,保证0xFFFF不是字符,这允许将 WEOF
表示为值-1.同样,当视为带符号的32位整数时,所有UTF-32字符均为正.所有广泛使用的字符集均设计有至少一个不代表字符的值.因此,将需要一个不考虑C编程语言而设计的自定义字符集,以使此问题出现在宽字符或宽至 int
的普通字符中.
In the UTF-16 character set, 0xFFFF is guaranteed not to be a character, which allows WEOF
to be represented as the value -1. Similarly, all UTF-32 characters are positive when viewed as a signed 32-bit integer. All widely used character sets are designed with at least one value that does not represent a character. Consequently, it would require a custom character set designed without consideration of the C programming language for this problem to occur with wide characters or with ordinary characters that are as wide as int
.
另请参见
- 可能未签名的字符等于EOF吗?
- 在托管实现中,
sizeof(int)
是否可以为1? - 具有
sizeof(int)== 1
完全符合"的实现可以吗? - ctype.h和sizeof(int)== sizeof(char)
- Might an unsigned char be equal to EOF?
- Can
sizeof(int)
ever be 1 on a hosted implementation? - Can an implementation that has
sizeof (int) == 1
"fully conform"? - ctype.h and sizeof(int) == sizeof(char)
这篇关于wint_t是否总是至少与wchar_t一样大?以及unsigned short如何满足wint_t的要求?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!