在调用toupper之前,我需要转换为unsigned char吗? [英] Do I need to cast to unsigned char before calling toupper?
问题描述
过去,在StackOverflow上有很高声望的人在评论中写道,有必要将 char
- 引用到 unsigned char
另一方面,在调用 std :: toupper
手,Bjarne Stroustrup没有提到需要这样做在C ++ - 编程语言。
他只是使用 toupper
like
string name = Niels Stroustrup;
void m3(){
string s = name.substr(6,10); // s =Stroustr up
name.replace(0,5,nicholas); // name becomenicholas Stroustrup
name [0] = toupper(name [0]); // name变成Nicholas Stroustrup
}
。)
参考资料输入需要表示为 unsigned char
。
对我来说,这似乎适用于每个 char
,因为 char
和 char
有相同的大小。
这是不必要的还是Stroustrup粗心?
编辑: libstdc ++手册提及输入字符必须来自基本源字符集,但不转换。我猜这是由@Keith Thompson的回复所覆盖,他们都有一个积极的表示为 signed char
和 unsigned char
?
是的, toupper
的参数需要转换为 unsigned char
以避免未定义行为的风险。
类型 char
, signed char
和 unsigned char
是三种不同类型。 char
具有与 signed char
em> unsigned char
。 (平原 char
是非常常见的,能够表示-128 .. + 127范围内的值。)
toupper
函数接受一个 int
参数并返回一个 int
结果。引用C标准第7.4节第1段:
在所有情况下,参数是
int
,其值为
,可表示为unsigned char
值
宏EOF
。如果参数具有任何其他值,则
行为是未定义的。
(C ++包含大多数C标准库, )
[]
索引操作符在 std :: string
返回 char
值。如果plain char
是一个有符号类型,如果 name [0]
返回的值恰好是负数,表达式
toupper(name [0])
有未定义的行为。
语言保证即使 char
已签名,基本字符集的所有成员都具有非负值,因此给定初始化
string name = Niels Stroustrup;
程序不会冒未定义的行为。但是,一般来说, char
值传递给 toupper
(或任何在< cctype> /
< ctype.h>
需要转换为 unsigned char
,以便隐式转换为 int
不会产生负值并导致未定义的行为。
< ctype.h>
函数通常使用查找表来实现:
//假定已经签名的是纯字符
/ pre>
char c = -2;
c = toupper(c); //未定义的行为
可以在该表的范围之外索引。
请注意,转换为
unsigned
:char c = -2;
c = toupper c); //未定义的行为
int 为32位,将
可以实现char
值-2
转换为unsigned
产生4294967294
。然后隐式转换为int
(参数类型),可能产生-2
。
toupper 从
CHAR_MIN
到UCHAR_MAX
)的所有值,但不是必须这样做。此外,< ctype.h>
中的函数需要接受值EOF
的参数,通常-1
。
C ++标准调整了一些C标准库函数。例如,
strchr
和其他几个函数由强制执行const
正确性的重载版本替换。对< cctype>
中声明的函数没有这样的调整。A while ago, someone with high reputation here on StackOverflow wrote in a comment that it is necessary to cast a
char
-argument tounsigned char
before callingstd::toupper
(and similar functions).On the other hand, Bjarne Stroustrup does not mention the need to do so in the C++-Programming Language. He just uses
toupper
likestring name = "Niels Stroustrup"; void m3() { string s = name.substr(6,10); // s = "Stroustr up" name.replace(0,5,"nicholas"); // name becomes "nicholas Stroustrup" name[0] = toupper(name[0]); // name becomes "Nicholas Stroustrup" }
(Quoted from said book, 4th edition.)
The reference says that the input needs to be representable as
unsigned char
. For me this sounds like it holds for everychar
sincechar
andunsigned char
have the same size.So is this cast unnecessary or was Stroustrup careless?
Edit: The libstdc++ manual mentions that the input character must be from the basic source character set, but does not cast. I guess this is covered by @Keith Thompson's reply, they all have a positive representation as
signed char
andunsigned char
?解决方案Yes, the argument to
toupper
needs to be converted tounsigned char
to avoid the risk of undefined behavior.The types
char
,signed char
, andunsigned char
are three distinct types.char
has the same range and representation as eithersigned char
orunsigned char
. (Plainchar
is very commonly signed and able to represent values in the range -128..+127.)The
toupper
function takes anint
argument and returns anint
result. Quoting the C standard, section 7.4 paragraph 1:In all cases the argument is an
int
, the value of which shall be representable as anunsigned char
or shall equal the value of the macroEOF
. If the argument has any other value, the behavior is undefined.(C++ incorporates most of the C standard library, and defers its definition to the C standard.)
The
[]
indexing operator onstd::string
returns achar
value. If plainchar
is a signed type, and if the value returned byname[0]
happens to be negative, then the expressiontoupper(name[0])
has undefined behavior.
The language guarantees that, even if plain
char
is signed, all members of the basic character set have non-negative values, so given the initializationstring name = "Niels Stroustrup";
the program doesn't risk undefined behavior. But yes, in general a
char
value passed totoupper
(or to any of the functions declared in<cctype>
/<ctype.h>
needs to be converted tounsigned char
, so that the implicit conversion toint
won't yield a negative value and cause undefined behavior.The
<ctype.h>
functions are commonly implemented using a lookup table. Something like:// assume plain char is signed char c = -2; c = toupper(c); // undefined behavior
may index outside the bounds of that table.
Note that converting to
unsigned
:char c = -2; c = toupper((unsigned)c); // undefined behavior
doesn't avoid the problem. If
int
is 32 bits, converting thechar
value-2
tounsigned
yields4294967294
. This is then implicitly converted toint
(the parameter type), which probably yields-2
.
toupper
can be implemented so it behaves sensibly for negative values (accepting all values fromCHAR_MIN
toUCHAR_MAX
), but it's not required to do so. Furthermore, the functions in<ctype.h>
are required to accept an argument with the valueEOF
, which is typically-1
.The C++ standard makes adjustments to some C standard library functions. For example,
strchr
and several other functions are replaced by overloaded versions that enforceconst
correctness. There are no such adjustments for the functions declared in<cctype>
.这篇关于在调用toupper之前,我需要转换为unsigned char吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!