在调用toupper之前,我需要转换为unsigned char吗? [英] Do I need to cast to unsigned char before calling toupper?

查看:196
本文介绍了在调用toupper之前,我需要转换为unsigned char吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

过去,在StackOverflow上有很高声望的人在评论中写道,有必要将 char - 引用到 unsigned char



另一方面,在调用 std :: toupper 手,Bjarne Stroustrup没有提到需要这样做在C ++ - 编程语言。
他只是使用 toupper like

  string name = Niels Stroustrup; 

void m3(){
string s = name.substr(6,10); // s =Stroustr up
name.replace(0,5,nicholas); // name becomenicholas Stroustrup
name [0] = toupper(name [0]); // name变成Nicholas Stroustrup
}

。)



参考资料输入需要表示为 unsigned char
对我来说,这似乎适用于每个 char ,因为 char char 有相同的大小。



这是不必要的还是Stroustrup粗心?



编辑: libstdc ++手册提及输入字符必须来自基本源字符集,但不转换。我猜这是由@Keith Thompson的回复所覆盖,他们都有一个积极的表示为 signed char unsigned char

解决方案

是的, toupper 的参数需要转换为 unsigned char 以避免未定义行为的风险。



类型 char signed char unsigned char 是三种不同类型。 char 具有与 signed char em> unsigned char 。 (平原 char 是非常常见的,能够表示-128 .. + 127范围内的值。)



toupper 函数接受一个 int 参数并返回一个 int 结果。引用C标准第7.4节第1段:


在所有情况下,参数是 int ,其值为
,可表示为 unsigned char
EOF 。如果参数具有任何其他值,则
行为是未定义的。


(C ++包含大多数C标准库, )



[] 索引操作符在 std :: string 返回 char 值。如果plain char 是一个有符号类型,如果 name [0] 返回的值恰好是负数,表达式

  toupper(name [0])

有未定义的行为。



语言保证即使 char 已签名,基本字符集的所有成员都具有非负值,因此给定初始化

  string name = Niels Stroustrup; 

程序不会冒未定义的行为。但是,一般来说, char 值传递给 toupper (或任何在< cctype> / < ctype.h> 需要转换为 unsigned char ,以便隐式转换为 int 不会产生负值并导致未定义的行为。



< ctype.h> 函数通常使用查找表来实现:

  //假定已经签名的是纯字符
char c = -2;
c = toupper(c); //未定义的行为
/ pre>

可以在该表的范围之外索引。



请注意,转换为 unsigned

  char c = -2; 
c = toupper c); //未定义的行为

int 为32位,将 char -2 转换为 unsigned 产生 4294967294 。然后隐式转换为 int (参数类型),可能产生 -2

可以实现



toupper CHAR_MIN UCHAR_MAX )的所有值,但不是必须这样做。此外,< ctype.h> 中的函数需要接受值 EOF 的参数,通常 -1



C ++标准调整了一些C标准库函数。例如, strchr 和其他几个函数由强制执行 const 正确性的重载版本替换。对< cctype> 中声明的函数没有这样的调整。


A while ago, someone with high reputation here on StackOverflow wrote in a comment that it is necessary to cast a char-argument to unsigned char before calling std::toupper (and similar functions).

On the other hand, Bjarne Stroustrup does not mention the need to do so in the C++-Programming Language. He just uses toupper like

string name = "Niels Stroustrup";

void m3() {
  string s = name.substr(6,10);  // s = "Stroustr up"
  name.replace(0,5,"nicholas");  // name becomes "nicholas Stroustrup"
  name[0] = toupper(name[0]);   // name becomes "Nicholas Stroustrup"
} 

(Quoted from said book, 4th edition.)

The reference says that the input needs to be representable as unsigned char. For me this sounds like it holds for every char since char and unsigned char have the same size.

So is this cast unnecessary or was Stroustrup careless?

Edit: The libstdc++ manual mentions that the input character must be from the basic source character set, but does not cast. I guess this is covered by @Keith Thompson's reply, they all have a positive representation as signed char and unsigned char?

解决方案

Yes, the argument to toupper needs to be converted to unsigned char to avoid the risk of undefined behavior.

The types char, signed char, and unsigned char are three distinct types. char has the same range and representation as either signed char or unsigned char. (Plain char is very commonly signed and able to represent values in the range -128..+127.)

The toupper function takes an int argument and returns an int result. Quoting the C standard, section 7.4 paragraph 1:

In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF . If the argument has any other value, the behavior is undefined.

(C++ incorporates most of the C standard library, and defers its definition to the C standard.)

The [] indexing operator on std::string returns a char value. If plain char is a signed type, and if the value returned by name[0] happens to be negative, then the expression

toupper(name[0])

has undefined behavior.

The language guarantees that, even if plain char is signed, all members of the basic character set have non-negative values, so given the initialization

string name = "Niels Stroustrup";

the program doesn't risk undefined behavior. But yes, in general a char value passed to toupper (or to any of the functions declared in <cctype> / <ctype.h> needs to be converted to unsigned char, so that the implicit conversion to int won't yield a negative value and cause undefined behavior.

The <ctype.h> functions are commonly implemented using a lookup table. Something like:

// assume plain char is signed
char c = -2;
c = toupper(c); // undefined behavior

may index outside the bounds of that table.

Note that converting to unsigned:

char c = -2;
c = toupper((unsigned)c); // undefined behavior

doesn't avoid the problem. If int is 32 bits, converting the char value -2 to unsigned yields 4294967294. This is then implicitly converted to int (the parameter type), which probably yields -2.

toupper can be implemented so it behaves sensibly for negative values (accepting all values from CHAR_MIN to UCHAR_MAX), but it's not required to do so. Furthermore, the functions in <ctype.h> are required to accept an argument with the value EOF, which is typically -1.

The C++ standard makes adjustments to some C standard library functions. For example, strchr and several other functions are replaced by overloaded versions that enforce const correctness. There are no such adjustments for the functions declared in <cctype>.

这篇关于在调用toupper之前,我需要转换为unsigned char吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆