与“toupper”有什么关系?家庭? [英] What's the deal with the "toupper" family?

查看:60
本文介绍了与“toupper”有什么关系?家庭?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



toupper function将int作为参数。鉴于字符文字的类型为int,这不太合理。在C.

(虽然为什么它不是类型char逃脱了我...)


toupper函数强加了另一个constrait,因为传递给它的值必须可以表示为unsigned char。 (如果C没有

要求所有字符值都是正数,那么这个constrait

也让我逃脱......)


假设我们有以下假设系统:


char已签名。


UCHAR_MAX == 255

SCHAR_MAX == 127

CHAR_MAX == 127


INT_MAX == 65535

我们能够代表使用正

数字的ASCII的所有字符,但除此之外的任何东西都需要在这个

系统上使用负数。


所以在这些无关紧要的角色上使用toupper是什么意思

,其数值为负?


假设我们有一个德国锋利的S ,或者是一个西班牙语N,上面有一个卷曲的东西,它的数字值是负数。我们如何将

传递给toupper?我们应该做以下吗?


toupper((unsigned char)c);


(还有一件事。如果你有一个有符号整数值,并将其转换为

其对应的无符号整数类型,然后返回已签名的

类型,您是否保证具有相同的值?即:


签名char s = -5;


unsigned char us = s;


s = us;


断言(-5 == s); / *这有保证吗?* /

-


Frederick Gotham


The "toupper" function takes an int as an argument. That''s not too
irrational given that a character literal is of type "int" in C.
(Although why it isn''t of type "char" escapes me... )

The "toupper" function imposes a further constrait in that the value
passed to it must be representable as a unsigned char. (If C does not
require all character values to be positive, then again, this constrait
too escapes me... )

Let''s say we have the following hypothetical system:

char is signed.

UCHAR_MAX == 255
SCHAR_MAX == 127
CHAR_MAX == 127

INT_MAX == 65535
We are able to represent all the characters of ASCII using positive
numbers, but anything beyond that would require negative numbers on this
system.

So what''s the deal with using toupper on these extraneous characters
whose numeric value is negative?

Let''s say we have a German sharp S, or a Spanish N with a curly thing on
top of it, and that its numeric value is negative. How do we go about
passing their value to toupper? Should we do the following?

toupper( (unsigned char)c );

(One more thing. If you have a signed integer value, and you cast it to
its corresponding unsigned integer type, and then back to the signed
type, are you guaranteed to have the same value? i.e.:

signed char s = -5;

unsigned char us = s;

s = us;

assert( -5 == s ); /* Is this guaranteed? */
--

Frederick Gotham

推荐答案

Frederick Gotham写道:
Frederick Gotham wrote:

toupper功能将一个int作为参数。这不是太多

非理性,因为字符文字在C中的类型为int。

(虽然为什么它不是''类型'char'让我逃脱......)


toupper函数强加了另一个constrait值传递给它的
必须可以表示为unsigned char。 (如果C没有

要求所有字符值都是正数,那么这个constrait

也让我逃脱...)
The "toupper" function takes an int as an argument. That''s not too
irrational given that a character literal is of type "int" in C.
(Although why it isn''t of type "char" escapes me... )

The "toupper" function imposes a further constrait in that the value
passed to it must be representable as a unsigned char. (If C does not
require all character values to be positive, then again, this constrait
too escapes me... )



回到C的黎明(嗯,清晨),

< ctype.hfunctions被定义为对所有值进行操作

由getchar(),getc()和fgetc()返回。这些函数

需要能够返回任何合法的字符代码加上

a代码,而不是所有字符都表示输入失败。

采用的方案对于输入函数,他们会返回一个非负的int来表示一个实际的字符代码

或一个负的int来表示输入失败。因此,< ctype.h>

函数继承了I / O函数的奇怪之处''

返回特殊值的做法。代替真实数据。


如果今天有人在设计C库,我怀疑这些决定是否会以同样的方式做出决定。 getchar()等。

在sizeof(int)== 1的系统上已经遇到麻烦了,因为那里有
没有空格用于区分非字符EOF值。如果

getchar()返回EOF,它实际上可能是真实数据:你只需要从返回的价值来判断
,但必须参考

feof()和ferror()函数。


即使带内也是如此。由I / O函数发出的信号是保留的,我怀疑新设计的< ctype.hfunctions会在整个值范围内定义
getchar()可以返回。

相反,它们将被定义为所有可能的char值,并且
将不会对EOF做出特殊规定。当将< ctype.hfunctions应用于从字符串中取出的
字符时,我们不需要这个愚蠢的演员。


然而,那匹特定的马很久以前就离开了谷仓。

Back in the Dawn of C (well, the Early Morning), the
<ctype.hfunctions were defined to operate on all the values
returned by getchar(), getc(), and fgetc(). These functions
need to be able to return any legitimate character code plus
a code unlike all characters to indicate an input failure.
The scheme adopted for the input functions was that they would
return a non-negative int to represent an actual character code
or a negative int to represent input failure. The <ctype.h>
functions thus inherited their oddities from the I/O functions''
practice of returning "special values" in place of "real data."

If one were designing the C library today, I doubt these
decisions would be made in the same way. getchar() et al. are
already in trouble on systems where sizeof(int)==1, because there
is no "space" for a distinguished non-character EOF value. If
getchar() returns EOF, it could actually be "real data:" you
cannot tell from the returned value alone, but must consult the
feof() and ferror() functions.

Even if the "in-band" signalling by the I/O functions were
retained, I doubt that newly-designed <ctype.hfunctions would
be defined on the entire range of values getchar() can return.
Rather, they would be defined for all possible char values and
would make no special provision for EOF. Then we''d need none
of this silly casting when applying the <ctype.hfunctions to
characters taken from a string.

However, that particular horse left the barn long ago.


假设我们有以下假设系统:

char已签名。


UCHAR_MAX == 255

SCHAR_MAX == 127

CHAR_MAX == 127


INT_MAX == 65535


我们能够使用正

数字表示ASCII的所有字符,但是除此之外的任何事情都需要在这个

系统上使用负数。
Let''s say we have the following hypothetical system:

char is signed.

UCHAR_MAX == 255
SCHAR_MAX == 127
CHAR_MAX == 127

INT_MAX == 65535

We are able to represent all the characters of ASCII using positive
numbers, but anything beyond that would require negative numbers on this
system.



字符代码128到255不可表示

作为char,但它们可以表示为unsigned char或

int。

Character codes 128 through 255 would not be representable
as char, but they would be representable as unsigned char or as
int.


那么在这些无关的角色上使用toupper是什么原因

的数值为负数?
So what''s the deal with using toupper on these extraneous characters
whose numeric value is negative?



如上所述:< ctype.hfunction的参数必须是

负值EOF或者字符代码表示为

一个unsigned char值。一个< ctype.h函数永远不应该看到一个

的负字符代码;如果确实如此,那么来电者是有错的。

As above: The argument to a <ctype.hfunction must be either
the negative value EOF or else a character code represented as
an unsigned char value. A <ctype.hfunction should never see a
negative character code; if it does, the caller is at fault.


假设我们有一个德国锋利的S,或者西班牙的N带有卷曲的东西
顶部,其数值为负数。我们如何将

传递给toupper?我们应该做以下事情吗?


toupper((unsigned char)c);
Let''s say we have a German sharp S, or a Spanish N with a curly thing on
top of it, and that its numeric value is negative. How do we go about
passing their value to toupper? Should we do the following?

toupper( (unsigned char)c );



是的。

Yes.


(还有一件事。如果你有一个有符号的整数值,你就投了它对应的
对应的无符号整数类型,然后回到签名的

类型,你保证有相同的值吗?ie:


signed char s = -5;

unsigned char us = s;
(One more thing. If you have a signed integer value, and you cast it to
its corresponding unsigned integer type, and then back to the signed
type, are you guaranteed to have the same value? i.e.:

signed char s = -5;
unsigned char us = s;



还没有问题:我们有价值UCHAR_MAX-4(252,

是一个8位字符)。

No problem yet: us has the value UCHAR_MAX-4 (252, for
an eight-bit character).


s = us;
s = us;



River City的问题。我们的价值超出范围

签名的字符,所以你得到(1)一个实现 -

定义的结果存储在s中,或者(2)实现定义的

信号被引发。(这不是未定义的行为,技术上说是b $ b,但它可能也是如此如果发出信号,那么

无法处理该信号并继续没有调用ing $ / $
未定义的行为。区别有点像观察

,你不会因为从一百层高的塔楼跌落而受到伤害,而只能通过最后的突然停止而受到伤害。) />

现在大多数实现,替代(1)采取

,实现定义的结果恰好等于

值s在转换为unsigned char之前已经过了。这不是由语言本身保证的结果。


-

Eric Sosman
< a href =mailto:es ***** @ acm-dot-org.inva> es ***** @ acm-dot-org.inva lid

Trouble in River City. The value of us is out of range
for a signed char, so you get either (1) an implementation-
defined result stored in s, or (2) an implementation-defined
signal is raised. (This is not undefined behavior, technically
speaking, but it might as well be. If a signal is raised, there
is no way to handle that signal and continue without invoking
undefined behavior. The distinction is somewhat like observing
that you will not be harmed by a fall from a hundred-story tower
but only by the sudden stop at the end.)

On most implementations nowadays, alternative (1) is taken
and the implementation-defined result happens to be equal to the
value s had before conversion to unsigned char. This is not an
outcome guaranteed by the language itself, though.

--
Eric Sosman
es*****@acm-dot-org.invalid


Frederick Gotham< fg ******* @ SPAM.comwrote:


#我们能够使用正面
#数字,但除此之外的任何内容都需要在这个


除了ASCII之外,还有许多不同的编码。


#让我们说我们有一个德国锋利的S,或者西班牙的N有一个卷曲的东西在

#顶部,那个它的数值是负数。我们如何将

#传递给toupper?我们应该执行以下操作吗?


不要依赖于非ASCII字符的编码。相反,你可以

使用宽字符(wchar_t)和函数如towupper。


-

SM Ryan http://www.rawbw.com/~wyrmwif/

我们发现了一个漏洞;他们不能再让我们出去了。
Frederick Gotham <fg*******@SPAM.comwrote:

# We are able to represent all the characters of ASCII using positive
# numbers, but anything beyond that would require negative numbers on this
# system.

Beyond ASCII, there are many different encodings.

# Let''s say we have a German sharp S, or a Spanish N with a curly thing on
# top of it, and that its numeric value is negative. How do we go about
# passing their value to toupper? Should we do the following?

Don''t depend on the encoding of non-ASCII characters. Instead you can
use wide characters (wchar_t) and functions like towupper.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
We found a loophole; they can''t keep us out anymore.


Frederick Gotham< fg ******* @ SPAM.com撰写:
Frederick Gotham <fg*******@SPAM.comwrites:

假设我们有一个德国锋利的S,或者西班牙N在它上面有一个卷曲的东西,它的数值是负数。我们如何将

传递给toupper?我们应该做以下事情吗?


toupper((unsigned char)c);
Let''s say we have a German sharp S, or a Spanish N with a curly thing on
top of it, and that its numeric value is negative. How do we go about
passing their value to toupper? Should we do the following?

toupper( (unsigned char)c );



是的。这是通常的事情。

Yes. That''s the usual thing to do.


(还有一件事。如果你有一个有符号整数值,你把它投到

对应的无符号整数类型,然后回到签名的

类型,你保证有相同的值吗?
(One more thing. If you have a signed integer value, and you cast it to
its corresponding unsigned integer type, and then back to the signed
type, are you guaranteed to have the same value?



否行为本质上是未定义的:


6.3.1.3有符号和无符号整数


1当整数类型的值转换为另一个整数时

类型除了_Bool,如果值可以用

表示新类型,则不变。


2否则,如果新类型未签名,则通过重复添加或减去一个可以在新类型中表示的

最大值,将值转换为

,直到

该值在新类型的范围内.49)


3否则,新类型已签名且值不能为$ / b
代表它;结果是

实现定义或实现定义的信号

被提出。


-

"这是一个很棒的答案。

它是偏离主题的,它是不正确的,并且它没有回答这个问题。

- 理查德希思菲尔德

No. The behavior is essentially undefined:

6.3.1.3 Signed and unsigned integers

1 When a value with integer type is converted to another integer
type other than _Bool, if the value can be represented by
the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted
by repeatedly adding or subtracting one more than the
maximum value that can be represented in the new type until
the value is in the range of the new type.49)

3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is
implementation-defined or an implementation-defined signal
is raised.

--
"This is a wonderful answer.
It''s off-topic, it''s incorrect, and it doesn''t answer the question."
--Richard Heathfield


这篇关于与“toupper”有什么关系?家庭?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆