是否有一种简单,可移植的方法来确定C中两个字符的顺序? [英] Is there a simple, portable way to determine the ordering of two characters in C?

查看:88
本文介绍了是否有一种简单,可移植的方法来确定C中两个字符的顺序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据标准:

执行字符集的成员的值是实现定义的.
(ISO/IEC 9899:1999 5.2.1/1)

The values of the members of the execution character set are implementation-defined.
(ISO/IEC 9899:1999 5.2.1/1)

进一步的标准:

...上述十进制数字列表中0之后的每个字符的值应比前一个的值大一个.
(ISO/IEC 9899:1999 5.2.1/3)

...the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.
(ISO/IEC 9899:1999 5.2.1/3)

该标准似乎要求执行字符集包括拉丁字母的26个大写字母和26个小写字母,但是我不认为必须以任何方式对这些字符进行排序.我只看到十进制数字的顺序规定.

It appears that the standard requires that the execution character set includes the 26 uppercase and 26 lowercase letters of the Latin alphabet, but I see no requirement that these characters be ordered in any way. I only see an order stipulation for the decimal digits.

严格来说,这似乎暗示着不能保证'a' < 'b'.现在,字母表中的字母按ASCII,UTF-8和EBCDIC的顺序排列.但是对于ASCII和UTF-8,我们有'A' < 'a',而对于EBCDIC,我们有'a' < 'A'.

This would seem to imply that, strictly speaking, there is no guarantee that 'a' < 'b'. Now, the letters of the alphabet are in order in each of ASCII, UTF-8, and EBCDIC. But for ASCII and UTF-8 we have 'A' < 'a', while for EBCDIC we have 'a' < 'A'.

ctype.h中具有一个可移植的比较字母字符的功能可能会很好.缺少这个或类似的东西,在我看来,必须在语言环境中查找CODESET的值并进行相应的处理,但这似乎并不简单.

It might be nice to have a function in ctype.h that compares alphabetic characters portably. Short of this or something similar, it seems to me that one must look in the locale to find the value of CODESET and proceed accordingly, but this doesn't seem simple.

我的直觉告诉我,这几乎从来不是问题.在大多数情况下,可以通过转换为小写字母来处理字母字符,因为对于最常用的字符集,字母是按顺序排列的.

My gut tells me that this is almost never an issue; for most cases alphabetical characters can be handled by converting to lowercase, because for the most commonly used character sets the letters are in order.

问题:给出两个字符

char c1;
char c2;

是否有一种简单,可移植的方式来确定c1是否在c2之前按字母顺序排列?还是假设即使标准似乎不能保证小写和大写字符始终按顺序出现?

is there a simple, portable way to determine if c1 precedes c2 alphabetically? Or do we assume that the lowercase and uppercase characters always occur in sequence, even though this does not appear to be guaranteed by the standard?

为澄清任何混乱,我只对标准所保证的52个拉丁字母在执行字符集中感兴趣.我意识到其他字母集也很重要,但似乎我们甚至不知道这小部分字母的顺序.

To clarify any confusion, I am really just interested in the 52 letters of the Latin alphabet that are guaranteed by the standard to be in the execution character set. I realize that other sets of letters are important, but it seems that we can't even know about the ordering of this small subset of letters.

我认为我需要澄清更多.如我所见,问题是我们通常认为拉丁字母的26个小写字母是有序的.我想能够断言"a"先于"b",当我们给定"a"和"b"整数值时,我们有一种方便的方式在代码中将其表示为'a' < 'b'.但是该标准不能保证上述代码将按预期进行评估.为什么不?该标准确实为数字0-9保证了这种行为,这似乎是明智的.如果我想确定一个字母字符是否位于另一个字符字符之前(例如出于排序目的),并且如果我希望此代码真正可移植,那么该标准似乎无济于事.现在,我必须依靠ASCII,UTF-8,EBCDIC等采用的惯例,即'a' < 'b'应该为true.但这并不是真正的可移植性,除非使用的唯一字符集依赖于此约定.这可能是真的.

I think that I need to clarify a bit more. The issue, as I see it, is that we commonly think of the 26 lowercase letters of the Latin alphabet as being ordered. I would like to be able to assert that 'a' comes before 'b', and we have a convenient way of expressing this in code as 'a' < 'b', when we give 'a' and 'b' integral values. But the standard gives no assurances that the above code will evaluate as expected. Why not? The standard does guarantee this behavior for the digits 0-9, and this seems sensible. If I want to determine if one letter-char precedes another, say for sorting purposes, and if I want this code to be truly portable, it seems like the standard offers no help. Now I have to rely on the convention that ASCII, UTF-8, EBCDIC, etc. have adopted that 'a' < 'b' should be true. But this isn't really portable unless the only character sets used rely on this convention; this may be true.

这个问题是由另一个问题线索带给我的:

This question originated for me in another question thread: Check if a letter is before or after another letter in C. Here, a few people suggested that you could determine the order of two letters stored in chars using inequalities. But one commenter pointed out that this behavior is not guaranteed by the standard.

推荐答案

对于A-Z,a-z以不区分大小写的方式(并使用复合文字):

For A-Z,a-z in a case-insensitive manner (and using compound literals):

char ch = foo();
az_rank = strtol((char []){ch, 0}, NULL, 36);

对于2个char,已知为A-Z,a-z,但可能为ASCII或EBCDIC.

For 2 char that are known to be A-Z,a-z but may be ASCII or EBCDIC.

int compare2alpha(char c1, char c2) {
  int mask = 'A' ^ 'a';  // Only 1 bit is different between upper/lower
  return (c1 | mask) - (c2 | mask);
}

或者,如果限制为256个不同的char,则可以使用将char映射到其等级的查找表.当然,该表取决于平台.

Alternatively, if limited to 256 differ char, could use a look-up table that maps the char to its rank. Of course the table is platform dependent.

这篇关于是否有一种简单,可移植的方法来确定C中两个字符的顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆