拆分一个 IEnumerable<T>;成固定大小的块(返回一个 IEnumerable<IEnumerable<T>>,其中内部序列的长度是固定的) [英] Split an IEnumerable&lt;T&gt; into fixed-sized chunks (return an IEnumerable&lt;IEnumerable&lt;T&gt;&gt; where the inner sequences are of fixed length)

查看:23
本文介绍了拆分一个 IEnumerable<T>;成固定大小的块(返回一个 IEnumerable<IEnumerable<T>>,其中内部序列的长度是固定的)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取一个 IEnumerable 并将其拆分为固定大小的块.

I want to take an IEnumerable<T> and split it up into fixed-sized chunks.

我有这个,但由于所有列表创建/复制,它看起来不雅:

I have this, but it seems inelegant due to all the list creation/copying:

private static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
    List<T> partition = new List<T>(partitionSize);
    foreach (T item in items)
    {
        partition.Add(item);
        if (partition.Count == partitionSize)
        {
            yield return partition;
            partition = new List<T>(partitionSize);
        }
    }
    // Cope with items.Count % partitionSize != 0
    if (partition.Count > 0) yield return partition;
}

有没有更地道的东西?

虽然这已被标记为 Divide array成一个子序列数组不是 - 这个问题涉及拆分数组,而这是关于IEnumerable.此外,该问题要求填充最后一个子序列.这两个问题密切相关,但又不一样.

Although this has been marked as a duplicate of Divide array into an array of subsequence array it is not - that question deals with splitting an array, whereas this is about IEnumerable<T>. In addition that question requires that the last subsequence is padded. The two questions are closely related but aren't the same.

推荐答案

你可以尝试像这样自己实现上面提到的 Batch 方法:

You could try to implement Batch method mentioned above on your own like this:

    static class MyLinqExtensions 
    { 
        public static IEnumerable<IEnumerable<T>> Batch<T>( 
            this IEnumerable<T> source, int batchSize) 
        { 
            using (var enumerator = source.GetEnumerator()) 
                while (enumerator.MoveNext()) 
                    yield return YieldBatchElements(enumerator, batchSize - 1); 
        } 

        private static IEnumerable<T> YieldBatchElements<T>( 
            IEnumerator<T> source, int batchSize) 
        { 
            yield return source.Current; 
            for (int i = 0; i < batchSize && source.MoveNext(); i++) 
                yield return source.Current; 
        } 
    }

我从 http://blogs.msdn.com/b/pfxteam/archive/2012/11/16/plinq-and-int32-maxvalue.aspx.

更新:请注意,此实现不仅会延迟评估批次,还会延迟评估批次内的项目,这意味着只有在枚举所有之前的批次之后才枚举批次时,它才会产生正确的结果.例如:

UPDATE: Please note, that this implementation not only lazily evaluates batches but also items inside batches, which means it will only produce correct results when batch is enumerated only after all previous batches were enumerated. For example:

public static void Main(string[] args)
{
    var xs = Enumerable.Range(1, 20);
    Print(xs.Batch(5).Skip(1)); // should skip first batch with 5 elements
}

public static void Print<T>(IEnumerable<IEnumerable<T>> batches)
{
    foreach (var batch in batches)
    {
        Console.WriteLine($"[{string.Join(", ", batch)}]");
    }
}

将输出:

[2, 3, 4, 5, 6] //only first element is skipped.
[7, 8, 9, 10, 11]
[12, 13, 14, 15, 16]
[17, 18, 19, 20]

因此,如果您的用例假设在按顺序评估批次时进行批处理,那么上面的懒惰解决方案将起作用,否则如果您不能保证严格的顺序批处理(例如,当您想并行处理批次时),您可能会需要一个热切地枚举批处理内容的解决方案,类似于上述问题或 MoreLINQ

So, if you use case assumes batching when batches are sequentially evaluated, then lazy solution above will work, otherwise if you can't guarantee strictly sequential batch processing (e.g. when you want to process batches in parallel), you will probably need a solution which eagerly enumerates batch content, similar to one mentioned in the question above or in the MoreLINQ

这篇关于拆分一个 IEnumerable<T>;成固定大小的块(返回一个 IEnumerable<IEnumerable<T>>,其中内部序列的长度是固定的)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆