为什么的String.Compare似乎不一致处理重音符号? [英] Why does string.Compare seem to handle accented characters inconsistently?

查看:114
本文介绍了为什么的String.Compare似乎不一致处理重音符号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我执行以下语句:

string.Compare("mun", "mün", true, CultureInfo.InvariantCulture)

结果是-1,表明'门'具有较低的数值比。'门'

The result is '-1', indicating that 'mun' has a lower numeric value than 'mün'.

不过,如果我执行该语句:

However, if I execute this statement:

string.Compare("Muntelier, Schweiz", "München, Deutschland", true, CultureInfo.InvariantCulture)

我得到'1',表明'曼泰利耶,Schewiz'应该去最后一次。

I get '1', indicating that 'Muntelier, Schewiz' should go last.

这是在比较中的错误?或者,更可能的是,有一个规则,我应该选含重音

Is this a bug in the comparison? Or, more likely, is there a rule I should be taking into account when sorting strings containing accented


究其原因,这是一个问题,我在整理列表,然后这样做的目的是让用'XXX'开头的每串手动二进制过滤器。

The reason this is an issue is, I'm sorting a list and then doing a manual binary filter that's meant to get every string beginning with 'xxx'.

以前我用的是Linq的去哪儿的方法,但现在我必须使用由另一人写了这个自定义函数,因为他说,它的性能会更好。

Previously I was using the Linq 'Where' method, but now I have to use this custom function written by another person, because he says it performs better.

不过,自定义功能似乎并没有考虑到任何unicode时的规则。NET了。所以,如果我告诉它通过'门'过滤器,它没有找到任何物品,即使有列表中的项目与门开头。

But the custom function doesn't seem to take into account whatever 'unicode' rules .NET has. So if I tell it to filter by 'mün', it doesn't find any items, even though there are items in the list beginning with 'mun'.

这似乎是因为重音字符不一致的排序,根据不同的重音字符后去什么字。

This seems to be because of the inconsistent ordering of accented characters, depending on what characters go after the accented character.


OK,我想我已经解决了这一问题。

OK, I think I've fixed the problem.

过滤器之前,我做基于第一排序的 N 每个字符串,其中的 N 的是搜索字符串的长度的字母。

Before the filter, I do a sort based on the first n letters of each string, where n is the length of the search string.

推荐答案

有是在工作中打破平局算法,请参见 http://unicode.org/reports/tr10/

There is a tie-breaking algorithm at work, see http://unicode.org/reports/tr10/

要解决
语言敏感排序的复杂性,一个
多层次比较算法是采用
。在比较两个词,为
为例,最重要的特点是
基本字符:如A和B.
重音的差异之间的
的区别通常是
忽略,如果有在基座字母的任何差异
。情况的差异
(大写与小写),是
通常被忽略,如果有在基或修饰任何
的差异。
标点符号是可变的。在一些
的情况下一个标点符号是
像一个基本字符处理。在
其它情况下,它应该被忽略
,如果有任何的基础上,口音,或者壳体
的差异。还可以有一个
最后,打破平局的水平,由此如果
有在所有
中的字符串中没有其他差异,则使用(归一化)码
指向顺序

To address the complexities of language-sensitive sorting, a multilevel comparison algorithm is employed. In comparing two words, for example, the most important feature is the base character: such as the difference between an A and a B. Accent differences are typically ignored, if there are any differences in the base letters. Case differences (uppercase versus lowercase), are typically ignored, if there are any differences in the base or accents. Punctuation is variable. In some situations a punctuation character is treated like a base character. In other situations, it should be ignored if there are any base, accent, or case differences. There may also be a final, tie-breaking level, whereby if there are no other differences at all in the string, the (normalized) code point order is used.

所以,Munt ...和Münc......按字母顺序排列的不同和排序基于T和c。

So, "Munt..." and "Münc..." are alphabetically different and sort based on the "t" and "c".

然而,门和门是按字母顺序相同(Uequivelent在失去了语言的U)这样的字符代码比较

Whereas, "mun" and "mün" are alphabetically the same ("u" equivelent to "ü" in lost languages) so the character codes are compared

这篇关于为什么的String.Compare似乎不一致处理重音符号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆