在调用toupper(),tolower()等之前,是否需要强制转换为无符号char? [英] Do I need to cast to unsigned char before calling toupper(), tolower(), et al.?

查看:215
本文介绍了在调用toupper(),tolower()等之前,是否需要强制转换为无符号char?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

不久前,在Stack Overflow上享有较高声誉的某人在评论中写道,有必要在调用std::toupperstd::tolower(以及类似函数)之前将char参数转换为unsigned char.

A while ago, someone with high reputation here on Stack Overflow wrote in a comment that it is necessary to cast a char-argument to unsigned char before calling std::toupper and std::tolower (and similar functions).

另一方面,Bjarne Stroustrup在 C ++编程语言中没有提到这样做的必要性.他只是像这样使用toupper

On the other hand, Bjarne Stroustrup does not mention the need to do so in the C++ Programming Language. He just uses toupper like

string name = "Niels Stroustrup";

void m3() {
  string s = name.substr(6,10);  // s = "Stroustr up"
  name.replace(0,5,"nicholas");  // name becomes "nicholas Stroustrup"
  name[0] = toupper(name[0]);   // name becomes "Nicholas Stroustrup"
} 

(摘自所述书,第4版.)

(Quoted from said book, 4th edition.)

参考表示输入需要表示为unsigned char. 对我来说,这听起来像每个char都成立,因为charunsigned char具有相同的大小.

The reference says that the input needs to be representable as unsigned char. For me this sounds like it holds for every char since char and unsigned char have the same size.

那么这是不必要的铸造还是Stroustrup粗心大意?

So is this cast unnecessary or was Stroustrup careless?

libstdc ++手册提到输入字符必须来自

The libstdc++ manual mentions that the input character must be from the basic source character set, but does not cast. I guess this is covered by @Keith Thompson's reply, they all have a positive representation as signed char and unsigned char?

推荐答案

是的,toupper的参数需要转换为unsigned char,以避免发生不确定行为的风险.

Yes, the argument to toupper needs to be converted to unsigned char to avoid the risk of undefined behavior.

类型charsigned charunsigned char是三种不同的类型. char signed char unsigned char具有相同的范围和表示形式. (普通的char是有符号的,并且能够表示-128 .. + 127范围内的值.)

The types char, signed char, and unsigned char are three distinct types. char has the same range and representation as either signed char or unsigned char. (Plain char is very commonly signed and able to represent values in the range -128..+127.)

toupper函数采用一个int参数,并返回一个int结果.引用C标准,第7.4节第1段:

The toupper function takes an int argument and returns an int result. Quoting the C standard, section 7.4 paragraph 1:

在所有情况下,参数均为 int ,其值应为 可表示为 unsigned char 或等于 宏 EOF .如果参数还有其他值,则 行为是不确定的.

In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF . If the argument has any other value, the behavior is undefined.

(C ++集成了大多数C标准库,并将其定义遵循C标准.)

(C++ incorporates most of the C standard library, and defers its definition to the C standard.)

std::string上的[]索引运算符返回char值.如果普通char是带符号类型,并且name[0]返回的值恰好为负,则表达式

The [] indexing operator on std::string returns a char value. If plain char is a signed type, and if the value returned by name[0] happens to be negative, then the expression

toupper(name[0])

具有未定义的行为.

该语言保证即使对普通char进行了签名,基本字符集的所有成员也都具有非负值,因此可以进行初始化

The language guarantees that, even if plain char is signed, all members of the basic character set have non-negative values, so given the initialization

string name = "Niels Stroustrup";

程序不会冒未定义行为的风险.但是可以,通常,传递给toupper(或传递给<cctype>/<ctype.h>声明的任何函数)的char值需要转换为unsigned char,以便隐式转换为int不会产生负值并导致不确定的行为.

the program doesn't risk undefined behavior. But yes, in general a char value passed to toupper (or to any of the functions declared in <cctype> / <ctype.h>) needs to be converted to unsigned char, so that the implicit conversion to int won't yield a negative value and cause undefined behavior.

<ctype.h>函数通常使用查找表来实现.像这样:

The <ctype.h> functions are commonly implemented using a lookup table. Something like:

// assume plain char is signed
char c = -2;
c = toupper(c); // undefined behavior

可能会在该表的范围之外建立索引.

may index outside the bounds of that table.

请注意,要转换为unsigned:

char c = -2;
c = toupper((unsigned)c); // undefined behavior

无法避免该问题.如果int是32位,则将char-2转换为unsigned会产生4294967294.然后将其隐式转换为int(参数类型),可能会产生-2.

doesn't avoid the problem. If int is 32 bits, converting the char value -2 to unsigned yields 4294967294. This is then implicitly converted to int (the parameter type), which probably yields -2.

toupper ,以便它对于负值(接受从CHAR_MINUCHAR_MAX的所有值)表现得很合理,但这不是必需的.此外,<ctype.h>中的函数需要接受值为EOF的参数,通常为-1.

toupper can be implemented so it behaves sensibly for negative values (accepting all values from CHAR_MIN to UCHAR_MAX), but it's not required to do so. Furthermore, the functions in <ctype.h> are required to accept an argument with the value EOF, which is typically -1.

C ++标准对某些C标准库功能进行了调整.例如,strchr和其他几个功能被强制执行const正确性的重载版本替代.对于<cctype>中声明的功能没有这种调整.

The C++ standard makes adjustments to some C standard library functions. For example, strchr and several other functions are replaced by overloaded versions that enforce const correctness. There are no such adjustments for the functions declared in <cctype>.

这篇关于在调用toupper(),tolower()等之前,是否需要强制转换为无符号char?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆