LINQ通过一个集合进行多次而不是一次迭代,然后进行一次迭代 [英] LINQ does multiple, instead of one, iterations through a collection and then some

查看:69
本文介绍了LINQ通过一个集合进行多次而不是一次迭代,然后进行一次迭代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我上了这个课:

public class SimHasher {
    int count = 0;

    //take each string and make an int[] out of it
    //should call Hash method lines.Count() times
    public IEnumerable<int[]> HashAll(IEnumerable<string> lines) {
        //return lines.Select(il => Hash(il));
        var linesCount = lines.Count();
        var hashes = new int[linesCount][];
        for (var i = 0; i < linesCount; ++i) {
            hashes[i] = Hash(lines.ElementAt(i));
        }
        return hashes;
    }

    public int[] Hash(string line) {
        Debug.WriteLine(++count);
        //stuff
    }
}

当我运行一个调用HashAll并将其传递给包含1000个元素的IEnumerable<string>的程序时,它按预期的方式运行:循环1000次,在调试控制台中将数字从1写入1000,程序以1以下结束第二.但是,如果我用LINQ语句替换HashAll方法的代码,如下所示:

When I run a program that calls HashAll and passes it an IEnumerable<string> with 1000 elements, it acts as expected: loops 1000 times, writing numbers from 1 to 1000 in the debug console with the program finishing in under 1 second. However if I replace the code of the HashAll method with the LINQ statement, like so:

public IEnumerable<int[]> HashAll(IEnumerable<string> lines) {
    return lines.Select(il => Hash(il));
}

该行为似乎取决于从何处调用HashAll.
如果我通过这种测试方法调用它

the behavior seems to depend on where HashAll gets called from.
If I call it from this test method

[Fact]
public void SprutSequentialIntegrationTest() {
    var inputContainer = new InputContainer(new string[] {
        @"D:\Solutions\SimHash\SimHashTests\R.in"
    });
    var simHasher = new SimHasher();
    var documentSimHashes = simHasher.HashAll(inputContainer.InputLines); //right here
    var queryRunner = new QueryRunner(documentSimHashes);
    var queryResults = queryRunner.RunAllQueries
        (inputContainer.Queries);

    var expectedQueryResults = System.IO.File.ReadAllLines(
        @"D:\Solutions\SimHash\SimHashTests\R.out")
        .Select(eqr => int.Parse(eqr));
    Assert.Equal(expectedQueryResults, queryResults);
}

即使只有1000条输入线,调试控制台中的计数器也达到13,000左右.还需要大约6秒钟才能完成,但仍然设法产生与循环版本相同的结果.
如果我这样从Main方法运行它

the counter in the debug console reaches around 13,000, even though there are only 1000 input lines. It also takes around 6 seconds to finish, but still manages to produce the same results as the loop version.
If I run it from the Main method like so

static void Main(string[] args) {
    var inputContainer = new InputContainer(args);
    var simHasher = new SimHasher();
    var documentSimHashes = simHasher.HashAll(inputContainer.InputLines);
    var queryRunner = new QueryRunner(documentSimHashes);
    var queryResults = queryRunner.RunAllQueries
        (inputContainer.Queries);
    foreach (var queryResult in queryResults) {
        Console.WriteLine(queryResult);
    }
}

它开始非常快地立即写出到输出控制台,而调试控制台中的计数器却成千上万.当我尝试逐行调试它时,它会直接进入foreach循环并逐一写出结果.经过一番谷歌搜索后,我发现这是由于LINQ查询被懒惰地评估所致.但是,每当它懒惰地评估结果时,调试控制台中的计数器就会增加1000以上,甚至超过输入行数.
是什么原因导致对Hash方法的如此多的调用?可以从这些摘要中推论得出吗?

it starts writing out to the output console right away, altough very slowly, while the counter in the debug console goes into tens of thousands. When I try to debug it line by line, it goes straight to the foreach loop and writes out the results one by one. After some Googling, I've found out that this is due to LINQ queries being lazily evaluated. However, each time it lazily evaluates a result, the counter in the debug console increase by more than 1000, which is even more than the number of input lines.
What is causing so many calls to the Hash method? Can it be deduced from these snippets?

推荐答案

获得比预期更多的迭代的原因是,有许多LINQ调用将IEnumerable<T>进行了多次迭代.

The reason why you get more iterations than you would expect is that there are LINQ calls that iterate the IEnumerable<T> multiple times.

当您在IEnumerable<T>上调用Count()时,LINQ会尝试查看是否存在CountLength以避免迭代,但是当没有快捷方式时,它将一直迭代IEnumerable<T>直到结束.

When you call Count() on an IEnumerable<T>, LINQ tries to see if there is a Count or Length to avoid iterating, but when there is no shortcut, it iterates IEnumerable<T> all the way to the end.

类似地,当您调用ElementAt(i)时,LINQ会尝试查看是否存在索引器,但通常会迭代该集合直到点i.这会使您的循环变成O(n 2 ).

Similarly, when you call ElementAt(i), LINQ tries to see if there is an indexer, but generally it iterates the collection up to point i. This renders your loop an O(n2).

通过调用ToList()ToArray()IEnumerable<T>存储在列表或数组中,可以轻松解决问题.这将循环遍历IEnumerable<T>一次,然后使用Count和索引来避免进一步的迭代.

You can easily fix your problem by storing your IEnumerable<T> in a list or an array by calling ToList() or ToArray(). This would iterate through IEnumerable<T> once, and then use Count and indexes to avoid further iterations.

这篇关于LINQ通过一个集合进行多次而不是一次迭代,然后进行一次迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆