isdigit(c)-字符或整数类型? [英] isdigit(c) - a char or int type?
问题描述
我编写了以下代码来测试给定的输入是否为数字.
I have written the following code to test if the given input is a digit or not.
#include<iostream>
#include<ctype.h>
#include<stdio.h>
using namespace std;
main()
{
char c;
cout<<"Please enter a digit: ";
cin>>c;
if(isdigit(c)) //int isdigit(int c) or char isdigit(char c)
{
cout<<"You entered a digit"<<endl;
}
else
{
cout<<"You entered a non-digit value"<<endl;
}
}
我的问题是:输入变量类型应该是什么?字符还是整数?
My question is: what should be the input variable type? char or int?
推荐答案
不幸的是,这种情况比其他答案更复杂.
The situation is unfortunately a bit more complex than has been told by the other answers.
首先:您的代码的第一部分是正确的(忽略多字节编码);如果要使用 cin
读取单个 char
,则必须使用 char
变量和>>
运算符.
First of all: the first part of your code is correct (disregarding multiple-byte encodings); if you want to read a single char
with cin
, you'll have to use a char
variable with >>
operator.
现在,关于 isdigit
:为什么要用 int
而不是 char
?
Now, about isdigit
: why does it take an int
instead of a char
?
全部来自C; isdigit
及其伴侣诞生于与 getchar()
之类的功能一起使用,该功能从流中读取字符并返回 int
.依次执行此操作是为了提供字符和错误代码: getchar()
可以返回 EOF
(已定义为某些实现定义)负常数),以其返回码表示输入流已结束.
It all comes from C; isdigit
and its companion were born to be used along with functions like getchar()
, which read a character from the stream and return an int
. This in turn was done to provide the character and an error code: getchar()
can return EOF
(which is defined as some implementation-defined negative constant) through its return code to signify that the input stream has ended.
因此,基本思想是:否定=错误代码;正=实际字符代码.
So, the basic idea is: negative = error code; positive = actual character code.
不幸的是,这带来了与常规" char
s的互操作性问题.
Unfortunately, this poses interoperability problems with "regular" char
s.
简短的题外话: char
最终只是一个整数类型,范围很小,但是却非常愚蠢.在大多数情况下-使用字节或字符代码时-您希望默认情况下将其设置为 unsigned
;OTOH,出于与其他整数类型( int
, short
, long
,...)的一致性原因,您可能会说正确的事情会是普通的 char
应该被签名
.标准选择了最愚蠢的方式:普通 char
是 signed
或 unsigned
,具体取决于编译器的实现者决定 1.
Short digression: char
ultimately is just an integral type with a very small range, but a particularly stupid one. In most occasions - when working with bytes or character codes - you'd want it to be unsigned
by default; OTOH, for coherency reasons with other integral types (int
, short
, long
, ...), you may say that the right thing would be that plain char
should be signed
. The Standard chose the most stupid way: plain char
is either signed
or unsigned
, depending from whatever the implementor of the compiler decides1.
因此,您必须为 char
被 signed
或 unsigned
做好准备;在大多数实现中,默认情况下使用 signed
签名,这对上面的 getchar()
布置造成了问题.
So, you have to be prepared for char
being either signed
or unsigned
; in most implementations it's signed
by default, which poses a problem with the getchar()
arrangement above.
如果使用 char
读取字节并进行了 signed
签名,则表示所有设置了高位的字节(也就是使用 unsigned
8位类型将> 127)变成负值.这显然与使用 EOF
的负值的 getchar()
不兼容-实际的负"字符和 EOF
之间可能存在重叠
If char
is used to read bytes and is signed
it means that all bytes with the high bit set (AKA bytes that, read with an unsigned
8-bit type would be >127) turn out to be negative values. This obviously isn't compatible with the getchar()
using negative values for EOF
- there could be overlap between actual "negative" characters and EOF
.
因此,当C函数谈论将字符接收/提供给 int
变量时,协定始终是假定该字符为已被强制转换为字符的 char
. unsigned char
(以使其始终为正,负值溢出到其范围的上半部),然后放入 int
.这使我们回到 isdigit
函数,该函数连同其伴随函数也具有以下约定:
So, when C functions talk about receiving/providing characters into int
variables the contract is always that the character is assumed to be a char
that has been cast to an unsigned char
(so that it is always positive, negative values overflowing into the top half of its range) and then put into an int
. Which brings us back to the isdigit
function, which, along its companion functions, has this contract as well:
头文件
< ctype.h>
声明了一些对字符进行分类和映射的函数.在所有情况下,该参数均为int
,其值应表示为unsigned char
或等于宏EOF
的值.如果该参数具有任何其他值,则行为是不确定的.
The header
<ctype.h>
declares several functions useful for classifying and mapping characters. In all cases the argument is anint
, the value of which shall be representable as anunsigned char
or shall equal the value of the macroEOF
. If the argument has any other value, the behavior is undefined.
(C99,§7.4,¶1)
(C99, §7.4, ¶1)
长话短说: if
至少应为:
if(isdigit((unsigned char)c))
问题不只是理论上的问题:一些广泛的C库实现将提供的值直接用作查找表的索引,因此,负值将读入未分配的内存并对程序进行段错误.
The problem is not just a theoretical one: several widespread C library implementations use the provided value straight as an index into a lookup table, so negative values will read into unallocated memory and segfault your program.
此外,您没有考虑到流可能已关闭的事实,因此>>
会返回而不会触碰您的变量(变量将处于未初始化的值);考虑到这一点,您应该先检查流是否仍处于有效状态,然后再使用 c
.
Also, you are not taking into account the fact that the stream may be closed, and thus >>
returning without touching your variable (which will be at an uninitialized value); to take this into account, you should check if the stream is still in a valid state before working on c
.
- 这当然有点不公平;正如 @Pete Becker 在下面的评论中指出的那样,这并不是说它们都是白痴,而是该标准主要尝试与现有实现兼容,这可能在未签名和已签名
字符
.这种分裂的痕迹可以在大多数现代编译器中找到,它们通常可以通过命令行选项(-fsigned-char
/-funsigned-char
(用于gcc/clang,在VC ++中为/J
).
- Of course this is a bit of an unfair rant; as @Pete Becker noted in the comment below, it's not like they were all morons, but just that the standard mostly tried to be compatible with existing implementations, which were probably evenly split between unsigned and signed
char
. Traces of this split can be found in most modern compilers, which can generally change the signedness ofchar
through command line options (-fsigned-char
/-funsigned-char
for gcc/clang,/J
in VC++).
这篇关于isdigit(c)-字符或整数类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!