Çisupper()函数 [英] C isupper() function

查看:69
本文介绍了Çisupper()函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我正在读C程序设计语言第2版,我不清楚这个练习:

I'm currently reading "The C Programming Language 2nd edition" and I'm not clear about this exercise:

功能可以被实现,以节省空间或以节省时间。探索这两种可能性。

Functions like isupper can be implemented to save space or to save time. Explore both possibilities.


  • 我怎样才能实现这个功能呢?

  • 我应该怎么写两个版本,一个节省时间和一个
    节省空间(一些伪code将是不错)?

我想AP preciate这个一些建议。

I would appreciate some advice about this.

推荐答案

一个版本使用具有适当的值初始化一个数组,每字符一个字节中的code组(加1,以允许对EOF,其也被传递到分类功能):

Original answer

One version uses an array initialized with appropriate values, one byte per character in the code set (plus 1 to allow for EOF, which may also be passed to the classification functions):

static const char bits[257] = { ...initialization... };

int isupper(int ch)
{
    assert(ch == EOF || (ch >= 0 && ch <= 255));
    return((bits+1)[ch] & UPPER_MASK);
}

注意'比特'可以被所有的各种功能,如isupper()中使用,islower判断(),因而isalpha()等与用于掩模适当的值。如果你让'比特'阵列多变在运行时,可以适应不同的(单字节)code组。

Note that the 'bits' can be used by all the various functions like isupper(), islower(), isalpha(), etc with appropriate values for the mask. And if you make the 'bits' array changeable at runtime, you can adapt to different (single-byte) code sets.

这需要空间 - 阵列

另一个版本,使有关的大写字符contiguousness假设,同时也对有限的有效大写字符(罚款ASCII,对于ISO 8859-1或其亲属不太好):

The other version makes assumptions about the contiguousness of upper-case characters, and also about the limited set of valid upper-case characters (fine for ASCII, not so good for ISO 8859-1 or its relatives):

int isupper(int ch)
{
    return (ch >= 'A' && ch <= 'Z');  // ASCII only - not a good implementation!
}

这可以(几乎)是宏实现;这是很难避免评价的字符的两倍,这是不实际的标准允许的。使用非标准(GNU)扩展,它可以被实现为计算字符参数只是一次的宏。为了延长这ISO 8859-1将需要第二个条件,沿着线:

This can (almost) be implemented in a macro; it is hard to avoid evaluating the character twice, which is not actually permitted in the standard. Using non-standard (GNU) extensions, it can be implemented as a macro that evaluates the character argument just once. To extend this to ISO 8859-1 would require a second condition, along the lines of:

int isupper(int ch)
{
    return ((ch >= 'A' && ch <= 'Z')) || (ch >= 0xC0 && ch <= 0xDD));
}

重复,作为一个宏非常经常和节省空间迅速成为成本作为比特掩蔽具有固定大小

Repeat that as a macro very often and the 'space saving' rapidly becomes a cost as the bit masking has a fixed size.

鉴于现代code套的要求,映射版本几乎总是在实践中使用;它可以在运行时适应当前code组等,其中基于范围的版本不能。

Given the requirements of modern code sets, the mapping version is almost invariably used in practice; it can adapt at run-time to the current code set, etc, which the range-based versions cannot.

我仍然无法弄清楚UPPER_MASK是如何工作的。你能否更具体地解释一下吗?

I still can't figure out how UPPER_MASK works. Can you explain it more specifically?

忽略了命名空间的问题在报头的符号,你有一串12个分类的宏:

Ignoring issues of namespaces for symbols in headers, you have a series of twelve classification macros:


  • 因而isalpha()

  • isupper()

  • islower判断()

  • 字符isalnum()

  • isgraph()

  • isprint判断()

  • iscntrl判断()

  • ISDIGIT()

  • ISBLANK()

  • isspace为()

  • ispunct()

  • isxdigit判断()

  • isalpha()
  • isupper()
  • islower()
  • isalnum()
  • isgraph()
  • isprint()
  • iscntrl()
  • isdigit()
  • isblank()
  • isspace()
  • ispunct()
  • isxdigit()

之间的区别isspace为( ISBLANK()是:


  • isspace为() - 空格(),换页('\\ F'),新线(的'\\ n'),回车('\\ r' ),水平制表符('\\ T')和垂直选项卡('\\ v'

  • ISBLANK() - 空格()和水平制表符(\\ t

  • isspace() — space (' '), form feed ('\f'), new-line ('\n'), carriage return ('\r'), horizontal tab ('\t'), and vertical tab ('\v').
  • isblank()space (' '), and horizontal tab ('\t').

有C语言环境定义这些设置在C标准的字符和准则。

There are definitions for these sets of characters in the C standard, and guidelines for the C locale.

例如(在C语言环境)中,无论是islower判断()'或 isupper()为真,如果因而isalpha()是真实的。

For example (in the C locale), for either 'islower()' or isupper() is true if isalpha() is true.

我觉得有必要位是:


  • DIGIT_MASK

  • XDIGT_MASK

  • ALPHA_MASK

  • LOWER_MASK

  • UPPER_MASK

  • PUNCT_MASK

  • SPACE_MASK

  • PRINT_MASK

  • CNTRL_MASK

  • BLANK_MASK

  • DIGIT_MASK
  • XDIGT_MASK
  • ALPHA_MASK
  • LOWER_MASK
  • UPPER_MASK
  • PUNCT_MASK
  • SPACE_MASK
  • PRINT_MASK
  • CNTRL_MASK
  • BLANK_MASK

这是这十口罩,您可以创建其他两个:

From these ten masks, you can create the other two:


  • ALNUM_MASK = ALPHA_MASK | DIGIT_MASK

  • GRAPH_MASK = ALNUM_MASK | PUNCT_MASK

从表面上看,你也可以使用 ALPHA_MASK = UPPER_MASK | LOWER_MASK 的,但在某些地区,有字母字符既不是大写也不小写

Superficially, you can also use ALPHA_MASK = UPPER_MASK | LOWER_MASK, but in some locales, there are alphabetic characters that are neither upper-case nor lower-case.

因此​​,我们可以如下定义口罩:

So, we can define masks as follows:

enum CTYPE_MASK {
    DIGIT_MASK = 0x0001,
    XDIGT_MASK = 0x0002,
    LOWER_MASK = 0x0004,
    UPPER_MASK = 0x0008,
    ALPHA_MASK = 0x0010,
    PUNCT_MASK = 0x0020,
    SPACE_MASK = 0x0040,
    PRINT_MASK = 0x0080,
    CNTRL_MASK = 0x0100,
    BLANK_MASK = 0x0200,

    ALNUM_MASK = ALPHA_MASK | DIGIT_MASK,
    GRAPH_MASK = ALNUM_MASK | PUNCT_MASK
};

extern unsigned short ctype_bits[];

有字符集的数据;所示的数据为ISO 8859-1的前半部分,但对于所有的8859-X code组的前半部分是相同的。我使用的是C99指定的初始化程序作为一个纪录片的援助,即使条目都是为了:

The data for the character set; the data shown is for the first half of ISO 8859-1, but is the same for the first half of all the 8859-x code sets. I'm using C99 designated initializers as a documentary aid, even though the entries are all in order:

unsigned short ctype_bits[] =
{
    [EOF   +1] = 0,
    ['\0'  +1] = CNTRL_MASK,
    ['\1'  +1] = CNTRL_MASK,
    ['\2'  +1] = CNTRL_MASK,
    ['\3'  +1] = CNTRL_MASK,
    ['\4'  +1] = CNTRL_MASK,
    ['\5'  +1] = CNTRL_MASK,
    ['\6'  +1] = CNTRL_MASK,
    ['\a'  +1] = CNTRL_MASK,
    ['\b'  +1] = CNTRL_MASK,
    ['\t'  +1] = CNTRL_MASK|SPACE_MASK|BLANK_MASK,
    ['\n'  +1] = CNTRL_MASK|SPACE_MASK,
    ['\v'  +1] = CNTRL_MASK|SPACE_MASK,
    ['\f'  +1] = CNTRL_MASK|SPACE_MASK,
    ['\r'  +1] = CNTRL_MASK|SPACE_MASK,
    ['\x0E'+1] = CNTRL_MASK,
    ['\x0F'+1] = CNTRL_MASK,
    ['\x10'+1] = CNTRL_MASK,
    ['\x11'+1] = CNTRL_MASK,
    ['\x12'+1] = CNTRL_MASK,
    ['\x13'+1] = CNTRL_MASK,
    ['\x14'+1] = CNTRL_MASK,
    ['\x15'+1] = CNTRL_MASK,
    ['\x16'+1] = CNTRL_MASK,
    ['\x17'+1] = CNTRL_MASK,
    ['\x18'+1] = CNTRL_MASK,
    ['\x19'+1] = CNTRL_MASK,
    ['\x1A'+1] = CNTRL_MASK,
    ['\x1B'+1] = CNTRL_MASK,
    ['\x1C'+1] = CNTRL_MASK,
    ['\x1D'+1] = CNTRL_MASK,
    ['\x1E'+1] = CNTRL_MASK,
    ['\x1F'+1] = CNTRL_MASK,

    [' '   +1] = SPACE_MASK|PRINT_MASK|BLANK_MASK,

    ['!'   +1] = PUNCT_MASK|PRINT_MASK,
    ['"'   +1] = PUNCT_MASK|PRINT_MASK,
    ['#'   +1] = PUNCT_MASK|PRINT_MASK,
    ['$'   +1] = PUNCT_MASK|PRINT_MASK,
    ['%'   +1] = PUNCT_MASK|PRINT_MASK,
    ['&'   +1] = PUNCT_MASK|PRINT_MASK,
    ['\''  +1] = PUNCT_MASK|PRINT_MASK,
    ['('   +1] = PUNCT_MASK|PRINT_MASK,
    [')'   +1] = PUNCT_MASK|PRINT_MASK,
    ['*'   +1] = PUNCT_MASK|PRINT_MASK,
    ['+'   +1] = PUNCT_MASK|PRINT_MASK,
    [','   +1] = PUNCT_MASK|PRINT_MASK,
    ['-'   +1] = PUNCT_MASK|PRINT_MASK,
    ['.'   +1] = PUNCT_MASK|PRINT_MASK,
    ['/'   +1] = PUNCT_MASK|PRINT_MASK,

    ['0'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['1'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['2'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['3'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['4'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['5'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['6'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['7'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['8'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,
    ['9'   +1] = DIGIT_MASK|PRINT_MASK|XDIGT_MASK,

    [':'   +1] = PUNCT_MASK|PRINT_MASK,
    [';'   +1] = PUNCT_MASK|PRINT_MASK,
    ['<'   +1] = PUNCT_MASK|PRINT_MASK,
    ['='   +1] = PUNCT_MASK|PRINT_MASK,
    ['>'   +1] = PUNCT_MASK|PRINT_MASK,
    ['?'   +1] = PUNCT_MASK|PRINT_MASK,
    ['@'   +1] = PUNCT_MASK|PRINT_MASK,

    ['A'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['B'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['C'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['D'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['E'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['F'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK|XDIGT_MASK,
    ['G'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['H'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['I'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['J'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['K'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['L'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['M'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['N'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['O'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['P'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['Q'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['R'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['S'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['T'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['U'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['V'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['W'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['X'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['Y'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,
    ['Z'   +1] = ALPHA_MASK|UPPER_MASK|PRINT_MASK,

    ['['   +1] = PUNCT_MASK|PRINT_MASK,
    ['\\'  +1] = PUNCT_MASK|PRINT_MASK,
    [']'   +1] = PUNCT_MASK|PRINT_MASK,
    ['^'   +1] = PUNCT_MASK|PRINT_MASK,
    ['_'   +1] = PUNCT_MASK|PRINT_MASK,
    ['`'   +1] = PUNCT_MASK|PRINT_MASK,

    ['a'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['b'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['c'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['d'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['e'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['f'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK|XDIGT_MASK,
    ['g'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['h'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['i'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['j'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['k'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['l'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['m'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['n'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['o'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['p'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['q'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['r'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['s'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['t'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['u'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['v'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['w'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['x'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['y'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,
    ['z'   +1] = ALPHA_MASK|LOWER_MASK|PRINT_MASK,

    ['{'   +1] = PUNCT_MASK|PRINT_MASK,
    ['|'   +1] = PUNCT_MASK|PRINT_MASK,
    ['}'   +1] = PUNCT_MASK|PRINT_MASK,
    ['~'   +1] = PUNCT_MASK|PRINT_MASK,
    ['\x7F'+1] = CNTRL_MASK,

    ...continue for second half of 8859-x character set...
};

#define isalpha(c)  ((ctype_bits+1)[c] & ALPHA_MASK)
#define isupper(c)  ((ctype_bits+1)[c] & UPPER_MASK)
#define islower(c)  ((ctype_bits+1)[c] & LOWER_MASK)
#define isalnum(c)  ((ctype_bits+1)[c] & ALNUM_MASK)
#define isgraph(c)  ((ctype_bits+1)[c] & GRAPH_MASK)
#define isprint(c)  ((ctype_bits+1)[c] & PRINT_MASK)
#define iscntrl(c)  ((ctype_bits+1)[c] & CNTRL_MASK)
#define isdigit(c)  ((ctype_bits+1)[c] & DIGIT_MASK)
#define isblank(c)  ((ctype_bits+1)[c] & BLANK_MASK)
#define isspace(c)  ((ctype_bits+1)[c] & SPACE_MASK)
#define ispunct(c)  ((ctype_bits+1)[c] & PUNCT_MASK)
#define isxdigit(c) ((ctype_bits+1)[c] & XDIGT_MASK)

如前所述,这里的名字其实都在为用户预留了空间​​,因此,如果你看了&LT;文件ctype.h&GT; 头,你会发现更多神秘的名字,他们很可能都开始与一个或两个下划线。

As already noted, the names here are actually in the namespace reserved for users, so if you looked in a <ctype.h> header you'd find more cryptic names and they'd probably all start with one or two underscores.

这篇关于Çisupper()函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆