c#SortedList< string,TValue&gt ;.成功添加键的包含键返回false [英] c# SortedList<string, TValue>.ContainsKey for successfully added key returns false

查看:36
本文介绍了c#SortedList< string,TValue&gt ;.成功添加键的包含键返回false的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

检查下面的更新3 我发现我遇到的问题与.Net 4.0、4.0客户端和4.5的C#字符串比较器的一个已知严重问题有关,这将导致字符串列表的排序顺序不一致(导致输出依赖于其中的顺序).输入和使用的排序算法).该问题已于2012年12月报告给Microsoft,并以无法解决"的方式关闭.可以使用一种解决方法,但是它的速度慢得多,以至于对于大型馆藏来说几乎是不实际的.

CHECK UPDATE 3 below I found out the issue I ran into is related to a known serious problem with c# string comparers for .Net 4.0, 4.0 client and 4.5, that will lead to inconsistent sort order of lists of strings (causing the output to depend on the order in the input and the sort algorithm used). The problem was reported to Microsoft in Dec 2012 and closed as "won't be fixed". A work around is available, but is so much slower that it is hardly practical for large collections.

在实现不变的PatriciaTrie时,我想将其性能与System.Collections.Generic.SortedList进行比较.我使用了以下文件https://github.com/rkapsi/patricia-trie/blob/master/src/test/resources/org/ardverk/collection/hamlet.txt 创建用于测试的输入单词表.

While implementing an immutable PatriciaTrie, I wanted to compare its performance with a System.Collections.Generic.SortedList. I used the following file https://github.com/rkapsi/patricia-trie/blob/master/src/test/resources/org/ardverk/collection/hamlet.txt to create an input wordlist for testing.

在将每个单词插入c#SortedList中时,使用 Comparer< string> .Default StringComparer.InvariantCulture 作为键比较器,需要输入多个条目使用普通的搜索方法无法检索成功插入的密钥(例如, ContainsKey 返回false),但是该密钥存在于列表中(通过迭代该列表可以看到).

When inserting each of the words in the c# SortedList, using either Comparer<string>.Default or StringComparer.InvariantCulture as the key comparer, a number of the entries that are successfully inserted cannot be retrieved using the normal search methods (e.g. ContainsKey returns false) but the key is present in the list as is observed by iterating the list.

更奇怪的是,比较器将从排序列表中检索的键与使用 ContainsKey 找不到的搜索键进行比较时,返回值'0'.

Even more curious, the comparer returns the value '0' when comparing the key retrieved from the sorted list with the search key that could not be found using ContainsKey.

下面的完整示例在我的系统上演示了此问题.

The complete example below demonstrates this issue on my system.

using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;

class Program
{
    static void Main(string[] args)
    {
        // the problem is possibly related to comparison.
        var fail = true;
        var comparer = fail ? StringComparer.InvariantCulture : StringComparer.Ordinal;

        // read hamlet (contains duplicate words)
        var words = File
            .ReadAllLines("hamlet.txt")
            .SelectMany(l => l.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries))
            .Select(w => w.Trim())
            .Where(w => !string.IsNullOrEmpty(w))
            .Distinct(comparer)
            .ToArray();

        // insert hamlet's words in the sorted list.
        var list = new SortedList<string, int>(comparer);
        var ndx = 0;
        foreach (var word in words)
            list[word] = ndx++;

        // search for each of the added words.
        foreach (var keyToSearch in words)
        {
            if (!list.ContainsKey(keyToSearch))
            {
                // was inserted, but cannot be retrieved.
                Console.WriteLine("Error - Key not found: \"{0}\"", keyToSearch);
                // however when we iterate over the list, we see that the entry is present
                var prefix = keyToSearch.Substring(0, Math.Min(keyToSearch.Length, 3));
                foreach (var wordCloseToSearchKey in list.Keys.Where(s => s.StartsWith(prefix)))
                {
                    // and using the SortedList's supplied comparison returns 0, signaling equality
                    var comparisonResult = list.Comparer.Compare(wordCloseToSearchKey, keyToSearch);
                    Console.WriteLine("{0} - comparison result = {1}", wordCloseToSearchKey, comparisonResult);
                }
            }
        }

        // Check that sort order of List.Keys is correct 
        var keys = list.Keys.ToArray();
        BinarySearchAll("list.Keys", keys, list.Comparer);
        CheckCorrectSortOrder("list.Keys", keys, list.Comparer);

        // Check that sort order of Array.Sort(List.Keys) is correct 
        var arraySortedKeys = CopySortSearchAndCheck("Array.Sort(List.Keys)", keys, list.Comparer);

        // Check that sort order of the Array.Sort(input words) is correct 
        var sortedInput = CopySortSearchAndCheck("Array.Sort(input words)", words, list.Comparer);

        Console.ReadLine();
    }

    static string[] CopySortSearchAndCheck(string arrayDesc, string[] input, IComparer<string> comparer)
    {
        // copy input
        var sortedInput = new string[input.Length];
        Array.Copy(input, sortedInput, sortedInput.Length);

        // sort it
        Array.Sort(sortedInput, comparer);

        // check that we can actually find the keys in the array using bin. search
        BinarySearchAll(arrayDesc, sortedInput, comparer);

        // check that sort order is correct
        CheckCorrectSortOrder(arrayDesc, sortedInput, comparer);

        return sortedInput;
    }

    static void BinarySearchAll(string arrayDesc, string[] sortedInput, IComparer<string> comparer)
    {
        // check that each key in the input can be found using bin. search
        foreach (var word in sortedInput)
        {
            var ix = Array.BinarySearch(sortedInput, word, comparer);
            if (ix < 0)
                // and it appears it cannot!
                Console.WriteLine("Error - {0} - Key not found: \"{1}\"", arrayDesc, word);
        }
    }

    static void CheckCorrectSortOrder(string arrayDesc, string[] sortedKeys, IComparer<string> comparer)
    {
        for (int n = 0; n < sortedKeys.Length; n++)
        {
            for (int up = n + 1; up < sortedKeys.Length; up++)
            {
                var cmp = comparer.Compare(sortedKeys[n], sortedKeys[up]);
                if (cmp >= 0)
                {
                    Console.WriteLine(
                        "{0}[{1}] = \"{2}\" not < than {0}[{3}] = \"{4}\"  - cmp = {5}",
                        arrayDesc, n, sortedKeys[n], up, sortedKeys[up], cmp);
                }
            }
            for (int down = n - 1; down > 0; down--)
            {
                var cmp = comparer.Compare(sortedKeys[n], sortedKeys[down]);
                if (cmp <= 0)
                {
                    Console.WriteLine(
                        "{0}[{1}] = \"{2}\" not > than {0}[{3}] = \"{4}\"  - cmp = {5}",
                        arrayDesc, n, sortedKeys[n], down, sortedKeys[down], cmp);
                }
            }
        }
    }
}

有人对此意外和奇怪的行为有解释吗?

Does anyone have an explanation for this unexpected and odd behaviour?

在将SortedList使用的比较器更改为 StringComparer.Ordinal 时(例如,在上面的示例中将 fail 更改为 false ),问题出了.消失,这似乎指向一个比较问题,但是我不太明白为什么.

When changing the comparer used by the SortedList to StringComparer.Ordinal (by e.g. changing fail to false in the above example) the problem disappears, which seems to point to a comparison issue, but I don't quite understand why.

已更新正如Sébastien指出的那样,.Net 3.5和3.5客户端配置文件中未显示此处描述的问题.它适用于.Net 4.0、4.0客户端和4.5.

UPDATED As noted by Sébastien the issue described here does not show up on .Net 3.5 and 3.5 client profiles. It does on .Net 4.0, 4.0 client and 4.5.

再进行一些挖掘之后,我注意到,如果我从列表中获取排序后的键,并对这些键运行 Array.BinarySearch ,它也会为相同的键返回负值(未找到)使用 SortedList.ContainsKey 找不到的内容.因此,这表明键的排序顺序不正确.

After some more digging, I noticed that if I take the sorted keys from the list and I run Array.BinarySearch on those keys, it also returns negative (not found) values for the same keys that are not found using SortedList.ContainsKey. Thus this would suggest that the sort order of the keys is not correct.

如果我从列表中获取已经排序的键,然后使用 Array.Sort 对其进行排序,则对于有问题的键,输出的排序顺序会有所不同.

If I take the already sorted keys from the list and sort them using Array.Sort the sort order of the output is different for the keys that were problematic.

因此,我添加了一个函数(使用列表的比较器)来检查给定数组的排序顺序是否正确(即,前一个键始终较小,而后一个键始终较大),并将输入限制为根据比较器不同.我将此功能应用于3个不同的输入(均使用相同的比较器):

So I added a function to check (using the list's comparer) if the sort order of a given array is correct (i.e. a preceding key is always smaller, a succeeding key is always larger) and restricted the input to words that are distinct according to the comparer. I applied this function on 3 different inputs (all using the same comparer):

  1. SortedList的键集合.
  2. 这些键上Array.Sort的输出.
  3. Array.Sort在文件输入上的输出.

(2)和(3)的输出与(1)相同,不同.然而,再次对(2)和(3)的Array.Sort输出执行Array.BinarySearch无法找到相同的键(返回< 0).此外,检查排序顺序是否正确的功能还表明,对于所有3种情况,所涉及的有问题的密钥周围的排序顺序都是不正确的.

The output of (2) and (3) is identical and different from (1). However performing Array.BinarySearch on the Array.Sort outputs of (2) and (3) again fails to find the same keys (returning < 0). Also the function that checks for correct sort order indicates that for all 3 cases, the sort order around the involved problematic keys is not correct.

在这一点上,我只是希望我所做的事情非常愚蠢,并且有一个简单的解释.希望有人可以向我指出.

At this point I am just hoping I did something incredibly stupid and there is a simple explanation. Hopefully someone can point that out to me.

示例代码已通过我的其他故障排除实验进行了更新,输出的屏幕快照可在此处找到 http://imgur.com/DU8SCsA .

The example code is updated with my additional troubleshooting experiments, a screenshot of the output can be found here http://imgur.com/DU8SCsA.

更新2 好的,我已经将问题范围缩小到了我看来是从.Net 4.0开始引入的C#字符串比较器所面临的一个非常严重的问题.

UPDATE 2 Ok, I have narrowed the problem down to what seems to me like a very serious problem with c# string comparers introduced as of .Net 4.0.

总而言之,假设我们有3个值:a1,a2和a3.为了使任何排序都能正常工作,我们希望 a1<a2 a2<为了使比较一致,因此以下内容也成立: a1< a3<a3 .

In summary, suppose we have 3 values: a1, a2 and a3. For any kind of sorting to work correctly, we would expect that if a1 < a2 and a2 < a3 that in order for comparison to be consistent, as a consequence the following also holds: a1 < a3.

然而,c#字符串比较器(至少对于 Comparer< string> .Default StringComparer.InvariantCulture )不是这种情况.

This is however not the case with the c# string comparers (at least for Comparer<string>.Default and StringComparer.InvariantCulture).

下面的小程序说明了这个确切的问题:

The little program below illustrates this exact problem:

class Program
{
    static void Main(string[] args)
    {
        var comparer = StringComparer.InvariantCulture;
        var a1 = "A";
        var a2 = "a\'";
        var a3 = "\'a";
        PrintComparison("a1", a1, "a2", a2, comparer);
        PrintComparison("a2", a2, "a3", a3, comparer);
        PrintComparison("a1", a1, "a3", a3, comparer);
        Console.ReadLine();
    }
    public static void PrintComparison(string firstSymbol, string first, string secondSymbol, string second, IComparer<string> comparer)
    {
        var cmp = comparer.Compare(first, second);
        var result = cmp == 0 ? "=" : cmp < 0 ? "<" : ">";
        Console.WriteLine("{0} {1} {2}   ({3} {1} {4})", firstSymbol, result, secondSymbol, first, second);
    }
}

这是它的输出:

a1 < a2   (A < a')
a2 < a3   (a' < 'a)
a1 > a3   (A > 'a)

结论似乎是,依靠c#字符串编译器确定的排序顺序并不安全,还是我遗漏了某些东西?

The conclusion would seem to be that it is not safe to rely on sort order determined using c# string comperators, or am I missing something?

更新3 该问题似乎已于2012年12月报告给MS,并以无法解决"状态关闭,这令人失望.请参阅下面评论中发布的链接(由于信誉有限,我无法在此处发布).这也列出了一种变通方法,我已实施该变通方法并用来验证它确实解决了标准比较器所观察到的问题.

UPDATE 3 This issue seems to have been reported to MS in December 2012, and is closed with status "won't be fixed", which is rather disappointing; refer to link posted in comments below (it appears I can't post in here due to my limited reputation points). This also lists a workaround, that I have implemented and used to verify that this indeed resolves the problems observed with the standard comparers.

public class WorkAroundStringComparer : StringComparer
{
    private static readonly Func<CompareInfo, string, CompareOptions, int> _getHashCodeOfString;
    private readonly CompareInfo _compareInfo;
    private readonly CompareOptions _compareOptions;

    static WorkAroundStringComparer()
    {
        // Need this internal method to compute hashcode
        // as an IEqualityComparer implementation.
        _getHashCodeOfString = BuildGetHashCodeOfStringDelegate();
    }

    static Func<CompareInfo, string, CompareOptions, int> BuildGetHashCodeOfStringDelegate()
    {
        var compareInfoType = typeof(CompareInfo);
        var argTypes = new[] { typeof(string), typeof(CompareOptions) };
        var flags = BindingFlags.NonPublic | BindingFlags.Instance;
        var methods = compareInfoType.GetMethods(flags).ToArray(); ;
        var method = compareInfoType.GetMethod("GetHashCodeOfString", flags, null, argTypes, null);

        var instance = Expression.Parameter(compareInfoType, "instance");
        var stringArg = Expression.Parameter(typeof(string), "string");
        var optionsArg = Expression.Parameter(typeof(CompareOptions), "options");
        var methodCall = Expression.Call(instance, method, stringArg, optionsArg);
        var expr = Expression.Lambda<Func<CompareInfo, string, CompareOptions, int>>(methodCall, instance, stringArg, optionsArg);
        return expr.Compile();
    }

    public WorkAroundStringComparer()
        : this(CultureInfo.InvariantCulture)
    {
    }

    public WorkAroundStringComparer(CultureInfo cultureInfo, CompareOptions compareOptions = CompareOptions.None)
    {
        if (cultureInfo == null)
            throw new ArgumentNullException("cultureInfo");
        this._compareInfo = cultureInfo.CompareInfo;
        this._compareOptions = compareOptions;
    }

    public override int Compare(string x, string y)
    {
        if (ReferenceEquals(x, y))
            return 0;
        if (ReferenceEquals(x, null))
            return -1;
        if (ReferenceEquals(y, null))
            return 1;

        var sortKeyFor_x = _compareInfo.GetSortKey(x, _compareOptions);
        var sortKeyFor_y = _compareInfo.GetSortKey(y, _compareOptions);
        return SortKey.Compare(sortKeyFor_x, sortKeyFor_y);
    }

    public override bool Equals(string x, string y)
    {
        return Compare(x, y) == 0;
    }

    public override int GetHashCode(string obj)
    {
        return _getHashCodeOfString(_compareInfo, obj, _compareOptions);
    }
}

此变通办法的问题在于,对于相当大的集合而言,它几乎是不实际的,因为它的速度比例如慢了一个数量级. StringComparer.InvariantCulture .

The problem with this workaround is that it is hardly practical for sizable collections, because it is an order of magnitude slower than e.g. StringComparer.InvariantCulture.

使用两个比较器对给定单词列表进行1000次排序所需的时间:

Time taken when sorting the given word list 1000 times using both comparers:

StringComparer.InvariantCulture : 00:00:15.3120013
WorkAroundStringComparer        : 00:01:35.8322409

因此,我仍然希望微软重新考虑或者有人知道可行的替代方案.否则,剩下的唯一选择就是使用 StringComparer.Ordinal .

So I am still hoping that either Microsoft reconsiders or someone knows a viable alternative. Otherwise the only option that remains is falling back on using StringComparer.Ordinal.

推荐答案

它是否与.Net Framework 4/4.5相关?我已经为.Net 3.5修改了您的示例,如下所示:

Could it be related to .Net Framework 4/4.5? I have adapted your example for .Net 3.5 like this:

var words = ReadFile("hamlet.txt");

//...

private static string[] ReadFile(string path)
{
    List<string> lines = new List<string>();
    using (StreamReader sr = new StreamReader(path))
    {
        string text = sr.ReadToEnd();
        lines.Add(text);
    }

    return lines.SelectMany(l => l.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries).Select(w => w.Trim()))
        .Where(w => !(w.ToCharArray().All(c => c == ' ')))
        .ToArray();
}

两个比较器都可以在使用.Net 3.5的XP上正常工作.

And both comparers work fine on XP using .Net 3.5.

这篇关于c#SortedList&lt; string,TValue&gt ;.成功添加键的包含键返回false的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆