为什么默认的字符串比较器不能保持一致性传递? [英] Why does the default string comparer fail to maintain transitive consistency?

查看:155
本文介绍了为什么默认的字符串比较器不能保持一致性传递?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这个问题的<一个href="http://stackoverflow.com/questions/9354966/string-sorting-issue-in-c-sharp/9355086#9355086">has注意到之前,或多或少简洁,但我还是创造这个新的线程,因为我写单元测试的时候遇到了这个问题了。

I know this issue has been noted before, more or less concisely, but I still create this new thread because I ran into the issue again when writing a unit test.

默认字符串比较(即文化相关的区分大小写的比较,我们用 string.CompareTo(字符串)的Comparer&LT搞定;字符串&GT; .DEFAULT StringComparer.CurrentCulture 的String.Compare(字符串,字符串)和其他人)违反当字符串包含连字符(或减号,我说的是普通U + 002D字符)传递。

The default string comparison (that is the culture-dependent case-sensitive comparison that we get with string.CompareTo(string), Comparer<string>.Default, StringComparer.CurrentCulture, string.Compare(string, string) and others) violates transitivity when the strings contain hyphens (or minus signs, I am talking about plain U+002D characters).

下面是一个简单的摄制:

Here is a simple repro:

static void Main()
{
  const string a = "fk-";
  const string b = "-fk";
  const string c = "Fk";

  Console.WriteLine(a.CompareTo(b));  // "-1"
  Console.WriteLine(b.CompareTo(c));  // "-1"
  Console.WriteLine(a.CompareTo(c));  // "1"

  var listX = new List<string> { a, b, c, };
  var listY = new List<string> { c, a, b, };
  var listZ = new List<string> { b, c, a, };
  listX.Sort();
  listY.Sort();
  listZ.Sort();
  Console.WriteLine(listX.SequenceEqual(listY));  // "False"
  Console.WriteLine(listY.SequenceEqual(listZ));  // "False"
  Console.WriteLine(listX.SequenceEqual(listZ));  // "False"
}

在上半部分,我们看到了传​​递将失败。 A 小于 B B 小于 C ,但 A 不低于 C

In the upper part we see how transitivity fails. a is less than b, and b is less than c, yet a fails to be less than c.

这违背了统一$ c中的记录的行为 $ C整理其中指出:

This goes against the documented behavior of Unicode collation which states that:

......任何字符串A,B和C,如果A&LT; B和B所述; C,那么A&LT; ℃。

... for any strings A, B, and C, if A < B and B < C, then A < C.

现在的排序与 A B C 酷似试图排名的岩石的手,纸张和剪刀中在公知的不及游戏。不可能完成的任务。

Now sorting a list with a, b and c is exactly like trying to rank the hands of "Rock", "Paper" and "Scissors" in the well-known intransitive game. An impossible task.

我的code样本的最后一部分上面显示,排序的结果取决于元素的初​​始订单上(也有其比较列表中的任何两个元素等于( 0 ))。

The last part of my code sample above shows that the result of sorting depends on the initial order of the elements (and there are no two elements in the list which compare "equal" (0)).

的LINQ的 listX.OrderBy(X =&GT; X)也受到影响,当然。这应该是一个稳定的排序,但订货含有的集合,当你得到奇怪的结果 B C 加上其他字符串。

Linq's listX.OrderBy(x => x) is also affected, of course. This should be a stable sort, but you get strange results when ordering a collection containing a, b and c together with other strings.

我想这跟的所有的我机器上的的CultureInfo 秒(因为这是一个文化相关的排序),包括固定区域性,并且每一位有同样的问题。我想这与.NET 4.5.1运行,但我相信老版本有相同的错误。

I tried this with all the CultureInfos on my machine (since this is a culture-dependent sort), including the "invariant culture", and each and every one has the same problem. I tried this with the .NET 4.5.1 runtime, but I believe older versions have the same bug.

结论:当在.NET中使用默认的比较字符串进行排序,结果未predictable如果一些字符串包含连字符

Conclusion: When sorting strings in .NET with the default comparer, results are unpredictable if some strings contain hyphens.

分别介绍了哪些变化.NET 4.0导致此行为?

这已经观察到这种行为是在不同的平台版本不一致:在.NET 3.5,用连字符的字符串能够可靠地排序。在所有版本的框架,叫 System.Globalization.CultureInfo.CurrentCulture.CompareInfo.GetSortKey 中提供了独特的 DeyData 这些字符串,那么为什么不把它们排序是否正确?

It has already been observed that this behavior is inconsistent across different versions of the platform: in .NET 3.5, strings with hyphens can be reliably sorted. In all versions of the framework, calling System.Globalization.CultureInfo.CurrentCulture.CompareInfo.GetSortKey provides unique DeyData for these strings, so why aren't they sorted correctly?

推荐答案

<一个href="https://connect.microsoft.com/VisualStudio/feedback/details/785931/string-compare-with-accents-on-framework-4-5-windows-8"相对=nofollow称号=Microsoft连接> Microsoft Connect上讨论 下面是一些code到解决方法:

Microsoft Connect Discussion Here is some code to workaround:

static int CompareStringUsingSortKey(string s1, string s2)
{
    SortKey sk1 = CultureInfo.InvariantCulture.CompareInfo.GetSortKey(s1);
    SortKey sk2 = CultureInfo.InvariantCulture.CompareInfo.GetSortKey(s2);
    return SortKey.Compare(sk1, sk2);
}

这篇关于为什么默认的字符串比较器不能保持一致性传递?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆