在大型数据集的可枚举LINQ查询结果上使用ToList()-效率问题? [英] Using ToList() on Enumerable LINQ query results for large data sets - Efficiency Issue?

查看:126
本文介绍了在大型数据集的可枚举LINQ查询结果上使用ToList()-效率问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在当前正在编写的应用程序中大量使用了LINQ查询,而我一直遇到的一种情况是必须将LINQ查询结果转换为列表以进行进一步处理(我已经我想要清单的原因.

I've been making a lot of use of LINQ queries in the application I'm currently writing, and one of the situations that I keep running into is having to convert the LINQ query results into lists for further processing (I have my reasons for wanting lists).

自从我现在反复使用它以来,如果效率低下,我想更好地了解此列表转换中会发生什么.因此,鉴于我执行了这样的一行:

I'd like to have a better understanding of what happens in this list conversion in case there are inefficiencies since I've used it repeatedly now. So, given I execute a line line like this:

var matches = (from x in list1 join y in list2 on x equals y select x).ToList();

问题:

  1. 除了创建新列表及其引用从查询返回的Enumerable中的元素之外,这里是否还有其他开销?

  1. Is there any overhead here aside from the creation of a new list and its population with references to the elements in the Enumerable returned from the query?

您认为这种效率低下吗?

Would you consider this inefficient?

是否有一种方法可以使LINQ查询直接生成列表,以避免在这种情况下进行转换?

Is there a way to get the LINQ query to directly generate a list to avoid the need for a conversion in this circumstance?

推荐答案

好,它会创建数据的副本. 可能效率低下-但要取决于发生了什么.如果最后需要一个List<T>,那么List<T>通常将接近您将获得的效率.唯一的例外是,如果您要只是进行转换并且源已经是列表,则使用ConvertAll会更高效,因为它可以创建对象的支持数组.合适的尺寸开始.

Well, it creates a copy of the data. That could be inefficient - but it depends on what's going on. If you need a List<T> at the end, List<T> is usually going to be close to as efficient as you'll get. The one exception to that is if you're going to just do a conversion and the source is already a list - then using ConvertAll will be more efficient, as it can create the backing array of the right size to start with.

如果仅 需要流式传输数据-例如您只需要对其执行foreach,并采取不影响原始数据源的操作-然后调用ToList绝对是效率低下的潜在原因.它将强制对整个list1进行评估-如果这是一个延迟评估的序列(例如,随机数生成器中的前1,000,000个值"),那么那就不好了.请注意,在进行联接时,尝试从序列中提取第一个值(无论是否填充列表)时,list2无论如何都会被评估为 .

If you only need to stream the data - e.g. you're just going to do a foreach on it, and taking actions which don't affect the original data sources - then calling ToList is definitely a potential source of inefficiency. It will force the whole of list1 to be evaluated - and if that's a lazily-evaluated sequence (e.g. "the first 1,000,000 values from a random number generator") then that's not good. Note that as you're doing a join, list2 will be evaluated anyway as soon as you try to pull the first value from the sequence (whether that's in order to populate a list or not).

您可能想阅读我的 ToList 上的Edulinq帖子,至少在一种可能的实现方式中-在后台查看发生了什么.

You might want to read my Edulinq post on ToList to see what's going on - at least in one possible implementation - in the background.

这篇关于在大型数据集的可枚举LINQ查询结果上使用ToList()-效率问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆