澄清字符处理 [英] clarification on character handling

查看:102
本文介绍了澄清字符处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

7.4#1州

标题< ctype.h>声明了几个对分类

和映射字符有用的函数.166)在所有情况下,参数都是一个int,

的值应该可以表示为unsigned char或者

等于宏EOF的值。如果

参数有任何其他值,则行为未定义。


为什么应该如下:

tolower( - 10);调用未定义的行为?


显然有一些关于如何实现tolower的东西,

但是我想不出任何具体的东西。


-

aegis

7.4#1 states
The header <ctype.h> declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.

Why should something such as:
tolower(-10); invoke undefined behavior?

It obviously has something with how tolower can be implemented,
but I can''t think of anything concrete.

--
aegis

推荐答案

aegis写道:
7.4#1州
标题< ctype.h>声明了几个对分类和映射字符有用的函数.166)在所有情况下,参数都是一个int,其值应该可以表示为unsigned char或者
等于宏观EOF。如果
参数有任何其他值,则行为未定义。

为什么应该如下:
tolower(-10);调用未定义的行为?


更重要的是,如果_not_ UB应该是什么?

显然有一些关于如何实施tolower的东西,
但是我不能想到任何具体的东西。
7.4#1 states
The header <ctype.h> declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.

Why should something such as:
tolower(-10); invoke undefined behavior?
More to the point, what should it be if _not_ UB?
It obviously has something with how tolower can be implemented,
but I can''t think of anything concrete.




考虑一个简单的查找表(事实上,EOF经常是故意的,并且是故意的。设为-1)。 toxxxx()宏和函数

通常以这种方式实现...

unsigned char _flags [257] = {0,.... };


#define tolower(x)(_flags [(x)+ 1]& _lower_case_flag)


如果你试试tolower( -10),然后引用的元素不在指定数组的
范围内。它与8位
字符系统上的tolower(32767)没什么区别。你为什么要_expect_某些定义的行为?


-

彼得



Consider a simple look up table (and the fact that EOF is quite
often and deliberately set at -1). The toxxxx() macros and functions
are often implemented in this way...

unsigned char _flags[257] = { 0, .... };

#define tolower(x) (_flags[(x) + 1] & _lower_case_flag)

If you try tolower(-10), then the element referenced is not within
the specified array. It''s no different to tolower(32767) on an 8-bit
char system. Why would you _expect_ some defined behaviour?

--
Peter


嗨aegis,


规范中提到了tolower(c)的预期参数。

如果传递了意外的参数,则不会指定。编译器编写者需要拥有自己的实现,所以它只需要编译器/系统依赖于



建议者有责任避免这种情况。

没有为这些C函数重新编写错误代码。对于C标准来说这很常见




问候,

Raju


aegis写道:
Hi aegis,

The expected argument to tolower(c) is mentioned in the specification.
It''s not specified if an unexpected arguments is passed. It''s left to
the Compiler writers to have their own implementation, so it''s
compiler/system dependent.

It''s progrmmer''s responsibility to avoid these kind of scenarios. There
is no error code retruned for these C functions. This is very common
for C standard.

Regards,
Raju


aegis wrote:
7.4#1州
标题< ctype.h>声明了几个对分类和映射字符有用的函数.166)在所有情况下,参数都是一个int,其值应该可以表示为unsigned char或者
等于宏观EOF。如果
参数有任何其他值,则行为未定义。

为什么应该如下:
tolower(-10);调用未定义的行为?

它显然有一些关于如何实现tolower的东西,
但我不能想到任何具体的东西。

-
aegis
7.4#1 states
The header <ctype.h> declares several functions useful for classifying
and mapping characters.166) In all cases the argument is an int, the
value of which shall be representable as an unsigned char or shall
equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.

Why should something such as:
tolower(-10); invoke undefined behavior?

It obviously has something with how tolower can be implemented,
but I can''t think of anything concrete.

--
aegis






aegis写道:

7.4#1州
标题< ctype.h>声明了几个对分类和映射字符有用的函数.166)在所有情况下,参数
都是一个int,其值应表示为无符号字符或者等于值宏观EOF。如果
参数有任何其他值,则行为未定义。

为什么应该如下:
tolower(-10);调用未定义的行为?

显然有一些关于如何实现tolower的东西,
但我不能想到任何具体的东西。

7.4#1 states
The header <ctype.h> declares several functions useful for
classifying and mapping characters.166) In all cases the argument
is an int, the value of which shall be representable as an
unsigned char or shall equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.

Why should something such as:
tolower(-10); invoke undefined behavior?

It obviously has something with how tolower can be implemented,
but I can''t think of anything concrete.




许多系统都有一个带掩码的位数组,这样数组

可以用字符+ 1的值索引。如果值为

EOF是-1,这映射到一个普通的基于0的数组,如果EOF是

其他适当的代码可以纠正。对于字符是否为大写,小写,

可打印,数字等,这些位具有

的意义。单个索引和掩码可以返回

适当的特性。


除了EOF之外的负(-ve)输入值会导致此错误,并导致非法内存访问中的
。 />

-

Chuck F(cb********@yahoo.com)(cb ******** @ worldnet。 att.net)

可用于咨询/临时嵌入式和系统。

< http://cbfalconer.home.att.net>使用worldnet地址!



Many systems have an array of bits with masks, such that the array
can be indexed by the value of the character + 1. If the value of
EOF is -1 this maps into a normal 0 based array, if EOF is
something else appropriate code can correct. The bits have
significance as to whether the character is upper case, lower case,
printable, numeric, etc. A single index and mask can return the
appropriate characteristic.

Negative (-ve) input values other than EOF foul this up, and result
in illegal memory accesses.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!


这篇关于澄清字符处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆