如何使用 LINQ 对数据进行分层分组? [英] How can I hierarchically group data using LINQ?

查看:29
本文介绍了如何使用 LINQ 对数据进行分层分组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些具有各种属性的数据,我想对这些数据进行分层分组.例如:

I have some data that has various attributes and I want to hierarchically group that data. For example:

public class Data
{
   public string A { get; set; }
   public string B { get; set; }
   public string C { get; set; }
}

我希望将其分组为:

A1
 - B1
    - C1
    - C2
    - C3
    - ...
 - B2
    - ...
A2
 - B1
    - ...
...

目前,我已经能够使用 LINQ 对其进行分组,这样顶级组将数据除以 A,然后每个子组除以 B,然后每个 B 子组包含 C 的子组,等等. LINQ 看起来像这样(假设一个名为 data) 的 IEnumerable 序列:

Currently, I have been able to group this using LINQ such that the top group divides the data by A, then each subgroup divides by B, then each B subgroup contains subgroups by C, etc. The LINQ looks like this (assuming an IEnumerable<Data> sequence called data):

var hierarchicalGrouping =
            from x in data
            group x by x.A
                into byA
                let subgroupB = from x in byA
                                group x by x.B
                                    into byB
                                    let subgroupC = from x in byB
                                                    group x by x.C
                                    select new
                                    {
                                        B = byB.Key,
                                        SubgroupC = subgroupC
                                    }
                select new
                {
                    A = byA.Key,
                    SubgroupB = subgroupB
                };

如您所见,所需的子分组越多,这就会变得有些混乱.有没有更好的方法来执行这种类型的分组?好像应该有,只是我没看到.

As you can see, this gets somewhat messy the more subgrouping that's required. Is there a nicer way to perform this type of grouping? It seems like there should be and I'm just not seeing it.

更新
到目前为止,我发现通过使用流畅的 LINQ API 而不是查询语言来表达这种分层分组可以说提高了可读性,但并没有感觉很枯燥.

Update
So far, I have found that expressing this hierarchical grouping by using the fluent LINQ APIs rather than query language arguably improves readability, but it doesn't feel very DRY.

我有两种方法:一种使用 GroupBy 和结果选择器,另一种使用 GroupBy 后跟 Select 调用.两者都可以格式化为比使用查询语言更具可读性,但仍然不能很好地扩展.

There were two ways I did this: one using GroupBy with a result selector, the other using GroupBy followed by a Select call. Both could be formatted to be more readable than using query language but don't still don't scale well.

var withResultSelector =
    data.GroupBy(a => a.A, (aKey, aData) =>
        new
        {
            A = aKey,
            SubgroupB = aData.GroupBy(b => b.B, (bKey, bData) =>
                new
                {
                    B = bKey,
                    SubgroupC = bData.GroupBy(c => c.C, (cKey, cData) =>
                    new
                    {
                        C = cKey,
                        SubgroupD = cData.GroupBy(d => d.D)
                    })
                })
        });

var withSelectCall =
    data.GroupBy(a => a.A)
        .Select(aG =>
        new
        {
            A = aG.Key,
            SubgroupB = aG
                .GroupBy(b => b.B)
                .Select(bG =>
            new
            {
                B = bG.Key,
                SubgroupC = bG
                    .GroupBy(c => c.C)
                    .Select(cG =>
                new
                {
                    C = cG.Key,
                    SubgroupD = cG.GroupBy(d => d.D)
                })
            })
        });

我想要什么...
我可以设想几种表达方式(假设语言和框架支持它).第一个是 GroupBy 扩展,它采用一系列用于键选择和结果选择的函数对,FuncFunc;.每对描述下一个子组.这个选项失败了,因为每一对可能需要 TKeyTResult 与其他的不同,这意味着 GroupBy 需要有限的参数和一个复杂的声明.

What I'd like...
I can envisage a couple of ways that this could be expressed (assuming the language and framework supported it). The first would be a GroupBy extension that takes a series of function pairs for key selection and result selection, Func<TElement, TKey> and Func<TElement, TResult>. Each pair describes the next sub-group. This option falls down because each pair would potentially require TKey and TResult to be different than the others, which would mean GroupBy would need finite parameters and a complex declaration.

第二个选项是 SubGroupBy 扩展方法,可以链接以生成子组.SubGroupBy 将与 GroupBy 相同,但结果将是先前的分组进一步分区.例如:

The second option would be a SubGroupBy extension method that could be chained to produce sub-groups. SubGroupBy would be the same as GroupBy but the result would be the previous grouping further partitioned. For example:

var groupings = data
    .GroupBy(x=>x.A)
    .SubGroupBy(y=>y.B)
    .SubGroupBy(z=>z.C)

// This version has a custom result type that would be the grouping data.
// The element data at each stage would be the custom data at this point
// as the original data would be lost when projected to the results type.
var groupingsWithCustomResultType = data
    .GroupBy(a=>a.A, x=>new { ... })
    .SubGroupBy(b=>b.B, y=>new { ... })
    .SubGroupBy(c=>c.C, c=>new { ... })

这个问题的难点在于如何按照我目前的理解有效地实现这些方法,每个级别都会重新创建新对象以扩展以前的对象.第一次迭代将创建 A 的分组,然后第二次将创建具有 A 键和 B 分组的对象,第三次将重做所有这些并添加 C 的分组.这似乎非常低效(尽管我怀疑我目前的选择无论如何都要这样做).如果调用传递所需内容的元描述并且仅在最后一次传递时创建实例,那将会很好,但这听起来也很困难.请注意,这类似于使用 GroupBy 但没有嵌套方法调用可以完成的操作.

The difficulty with this is how to implement the methods efficiently as with my current understanding, each level would re-create new objects in order to extend the previous objects. The first iteration would create groupings of A, the second would then create objects that have a key of A and groupings of B, the third would redo all that and add the groupings of C. This seems terribly inefficient (though I suspect my current options actually do this anyway). It would be nice if the calls passed around a meta-description of what was required and the instances were only created on the last pass, but that sounds difficult too. Note that his is similar to what can be done with GroupBy but without the nested method calls.

希望一切都有意义.我希望我在这里追逐彩虹,但也许不是.

Hopefully all that makes sense. I expect I am chasing rainbows here, but maybe not.

更新 - 另一种选择
我认为比我之前的建议更优雅的另一种可能性依赖于每个父组只是一个键和一系列子项(如示例中所示),就像 IGrouping 现在提供的一样.这意味着构建此分组的一个选项是一系列键选择器和一个结果选择器.

Update - another option
Another possibility that I think is more elegant than my previous suggestions relies on each parent group being just a key and a sequence of child items (as in the examples), much like IGrouping provides now. That means one option for constructing this grouping would be a series of key selectors and a single results selector.

如果key都被限制在一个集合类型中,这不是不合理的,那么这可以生成为一个key选择器和一个结果选择器的序列,或者一个结果选择器和一个params键选择器.当然,如果键必须是不同类型和不同级别,这将再次变得困难,除了由于泛型参数化的工作方式而导致的层次结构深度有限.

If the keys were all limited to a set type, which is not unreasonable, then this could be generated as a sequence of key selectors and a results selector, or a results selector and a params of key selectors. Of course, if the keys had to be of different types and different levels, this becomes difficult again except for a finite depth of hierarchy due to the way generics parameterization works.

以下是我的意思的一些说明性示例:

Here are some illustrative examples of what I mean:

例如:

public static /*<grouping type>*/ SubgroupBy(
    IEnumerable<Func<TElement, TKey>> keySelectors,
    this IEnumerable<TElement> sequence,
    Func<TElement, TResult> resultSelector)
{
    ...
}

var hierarchy = data.SubgroupBy(
                    new [] {
                        x => x.A,
                        y => y.B,
                        z => z.C },
                    a => new { /*custom projection here for leaf items*/ })

或者:

public static /*<grouping type>*/ SubgroupBy(
    this IEnumerable<TElement> sequence,
    Func<TElement, TResult> resultSelector,
    params Func<TElement, TKey>[] keySelectors)
{
    ...
}

var hierarchy = data.SubgroupBy(
                    a => new { /*custom projection here for leaf items*/ },
                    x => x.A,
                    y => y.B,
                    z => z.C)

这并不能解决实现效率低下的问题,但应该可以解决复杂的嵌套问题.但是,这个分组的返回类型是什么?我需要自己的界面还是可以以某种方式使用 IGrouping .我需要定义多少或层次结构的可变深度仍然使这成为不可能?

This does not solve implementation inefficiencies, but it should solve the complex nesting. However, what would the return type of this grouping be? Would I need my own interface or can I use IGrouping somehow. How much do I need to define or does the variable depth of the hierarchy still make this impossible?

我的猜测是,这应该与任何 IGrouping 调用的返回类型相同,但是如果类型系统不涉及任何传递的参数,类型系统如何推断该类型?

My guess is that this should be the same as the return type from any IGrouping call but how does the type system infer that type if it isn't involved in any of the parameters that are passed?

这个问题扩展了我的理解,这很好,但我的大脑很痛.

This problem is stretching my understanding, which is great, but my brain hurts.

推荐答案

这里描述如何实现分层分组机制.

Here is a description how you can implement an hierarchical grouping mechanism.

来自这个描述:

结果类:

public class GroupResult
{
    public object Key { get; set; }
    public int Count { get; set; }
    public IEnumerable Items { get; set; }
    public IEnumerable<GroupResult> SubGroups { get; set; }
    public override string ToString() 
    { return string.Format("{0} ({1})", Key, Count); }
}

扩展方法:

public static class MyEnumerableExtensions
{
    public static IEnumerable<GroupResult> GroupByMany<TElement>(
        this IEnumerable<TElement> elements,
        params Func<TElement, object>[] groupSelectors)
    {
        if (groupSelectors.Length > 0)
        {
            var selector = groupSelectors.First();

            //reduce the list recursively until zero
            var nextSelectors = groupSelectors.Skip(1).ToArray();
            return
                elements.GroupBy(selector).Select(
                    g => new GroupResult
                    {
                        Key = g.Key,
                        Count = g.Count(),
                        Items = g,
                        SubGroups = g.GroupByMany(nextSelectors)
                    });
        }
        else
            return null;
    }
}

用法:

var result = customers.GroupByMany(c => c.Country, c => c.City);

这是代码的改进和正确键入版本.

Here is an improved and properly typed version of the code.

public class GroupResult<TItem>
{
    public object Key { get; set; }
    public int Count { get; set; }
    public IEnumerable<TItem> Items { get; set; }
    public IEnumerable<GroupResult<TItem>> SubGroups { get; set; }
    public override string ToString() 
    { return string.Format("{0} ({1})", Key, Count); }
}

public static class MyEnumerableExtensions
{
    public static IEnumerable<GroupResult<TElement>> GroupByMany<TElement>(
        this IEnumerable<TElement> elements,
        params Func<TElement, object>[] groupSelectors)
    {
        if (groupSelectors.Length > 0)
        {
            var selector = groupSelectors.First();

            //reduce the list recursively until zero
            var nextSelectors = groupSelectors.Skip(1).ToArray();
            return
                elements.GroupBy(selector).Select(
                    g => new GroupResult<TElement> {
                        Key = g.Key,
                        Count = g.Count(),
                        Items = g,
                        SubGroups = g.GroupByMany(nextSelectors)
                    });
        } else {
            return null;
        }
    }
}

这篇关于如何使用 LINQ 对数据进行分层分组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆