多列数据转换 [英] Multi column data transformation

查看:101
本文介绍了多列数据转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从数据源接收数据,我需要先将其旋转才能将信息发送到UI进行显示. I am new to concept of pivoting & I am not sure how to go about it.

I am receiving data from a data source which I need to pivot before I can send the information to UI for display. I am new to concept of pivoting & I am not sure how to go about it.

问题有两个部分:

  1. 形成标题
  2. 透视数据以匹配标题

谨记的事情:

  1. 我有一些我不想讨论的列.我称它们为static columns.

我需要旋转某些列以形成多级标题信息.我称它们为dynamic columns

I need to pivot certain columns to form multi level header info. I call them dynamic columns

某些列需要进行透视,其中包含实际值.我称他们为value columns.

Some columns needs to be pivoted which contains actual values. I called them value columns.

一个dynamic, static and value columns的个数没有没有限制.

假设,当数据到来时,我们将首先获得静态列的数据,然后是动态列&的数据.然后是值列.

It is assumed that, when data comes, we will first have data for static columns then dynamic columns & then for value columns.

请参阅附件中的图像以获取更多信息.

虚拟数据:

class Program
    {
        static void Main(string[] args)
        {
            var _staticColumnCount = 2; //Columns that should not be pivoted        
            var _dynamicColumnCount = 2; // Columns which needs to be pivoted to form header            
            var _valueColumnCount = 1; //Columns that represent Actual value        
            var valueColumnIndex = 4; //Assuming index starts with 0;

            List<List<string>> headerInfo = new List<List<string>>();
            headerInfo.Add(new List<string> {"Product Three", "Item Ten"});
            headerInfo.Add(new List<string> {"Product Two", "Item Five"});
            headerInfo.Add(new List<string> {"Product Two", "Item Seven"});
            headerInfo.Add(new List<string> {"Product Two", "Item Nine"});
            headerInfo.Add(new List<string> {"Product One", "Item One"});
            headerInfo.Add(new List<string> {"Product One", "Item Two"});
            headerInfo.Add(new List<string> {"Product One", "Item Four"});
            headerInfo.Add(new List<string> {"Product One", "Item Six"});
            headerInfo.Add(new List<string> {"Product One", "Item Eight"});
            headerInfo.Add(new List<string> {"Product One", "Item Eleven"});


            List<List<string>> data = new List<List<string>>();
            data.Add(new List<string> {"Global", "Europe", "Product One", "Item One", "579984.59"});
            data.Add(new List<string> {"Global", "North America", "Product One", "Item Two", "314586.73"});
            data.Add(new List<string> {"Global", "Asia", "Product One", "Item One", "62735.13"});
            data.Add(new List<string> {"Global", "Asia", "Product Two", "Item Five", "12619234.69"});
            data.Add(new List<string> {"Global", "North America", "Product Two", "Item Five", "8953713.39"});
            data.Add(new List<string> {"Global", "Europe", "Product One", "Item Two", "124267.4"});
            data.Add(new List<string> {"Global", "Asia", "Product One", "Item Four", "482338.49"});
            data.Add(new List<string> {"Global", "North America", "Product One", "Item Four", "809185.13"});
            data.Add(new List<string> {"Global", "Europe", "Product One", "Item Four", "233101"});
            data.Add(new List<string> {"Global", "Asia", "Product One", "Item Two", "120561.65"});
            data.Add(new List<string> {"Global", "North America", "Product One", "Item Six", "1517359.37"});
            data.Add(new List<string> {"Global", "Europe", "Product One", "Item Six", "382590.45"});
            data.Add(new List<string> {"Global", "North America", "Product One", "Item Eight", "661835.64"});
            data.Add(new List<string> {"Global", "Europe", "Product Three", "Item Three", "0"});
            data.Add(new List<string> {"Global", "Europe", "Product One", "Item Eight", "0"});
            data.Add(new List<string> {"Global", "Europe", "Product Two", "Item Five", "3478145.38"});
            data.Add(new List<string> {"Global", "Asia", "Product One", "Item Six", "0"});
            data.Add(new List<string> {"Global", "North America", "Product Two", "Item Seven", "4247059.97"});
            data.Add(new List<string> {"Global", "Asia", "Product Two", "Item Seven", "2163718.01"});
            data.Add(new List<string> {"Global", "Europe", "Product Two", "Item Seven", "2158782.48"});
            data.Add(new List<string> {"Global", "North America", "Product Two", "Item Nine", "72634.46"});
            data.Add(new List<string> {"Global", "Europe", "Product Two", "Item Nine", "127500"});
            data.Add(new List<string> {"Global", "North America", "Product One", "Item One", "110964.44"});
            data.Add(new List<string> {"Global", "Asia", "Product Three", "Item Ten", "2064.99"});
            data.Add(new List<string> {"Global", "Europe", "Product One", "Item Eleven", "0"});
            data.Add(new List<string> {"Global", "Asia", "Product Two", "Item Nine", "1250"});


        }
    }

推荐答案

您所说的static columns通常称为行组dynamic columns-列组value columns-值汇总或简单的.

What you call static columns is usually called row groups, dynamic columns - column groups and value columns - value aggregates or simple values.

为实现这一目标,我建议使用以下简单的数据结构:

For achieving the goal I would suggest the following simple data structure:

public class PivotData
{
    public IReadOnlyList<PivotValues> Columns { get; set; }
    public IReadOnlyList<PivotDataRow> Rows { get; set; }
}

public class PivotDataRow
{
    public PivotValues Data { get; set; }
    public IReadOnlyList<PivotValues> Values { get; set; }
}

PivotDataColumns成员将代表您所说的 header ,而Row成员-PivotDataRow对象的列表,其中PivotDataRow成员包含行组值和Values-相应Columns索引的(PivotDataRow.Values始终与PivotData.Columns.Count具有相同的Count).

The Columns member of PivotData will represent what you call header, while the Row member - a list of PivotDataRow objects with Data member containing the row group values and Values - the values for the corresponding Columns index (PivotDataRow.Values will always have the same Count as PivotData.Columns.Count).

以上数据结构可序列化/反序列化为JSON(已通过Newtosoft.Json测试),可用于以所需格式填充UI.

The above data structure is serializable/deserializable to JSON (tested with Newtosoft.Json) and can be used to populate UI with the desired format.

用于表示行组值,列组值和聚合值的核心数据结构是:

The core data structure used to represent both row group values, column group values and aggregate values is this:

public class PivotValues : IReadOnlyList<string>, IEquatable<PivotValues>, IComparable<PivotValues>
{
    readonly IReadOnlyList<string> source;
    readonly int offset, count;
    public PivotValues(IReadOnlyList<string> source) : this(source, 0, source.Count) { }
    public PivotValues(IReadOnlyList<string> source, int offset, int count)
    {
        this.source = source;
        this.offset = offset;
        this.count = count;
    }
    public string this[int index] => source[offset + index];
    public int Count => count;
    public IEnumerator<string> GetEnumerator()
    {
        for (int i = 0; i < count; i++)
            yield return this[i];
    }
    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
    public override int GetHashCode()
    {
        unchecked
        {
            var comparer = EqualityComparer<string>.Default;
            int hash = 17;
            for (int i = 0; i < count; i++)
                hash = hash * 31 + comparer.GetHashCode(this[i]);
            return hash;
        }
    }
    public override bool Equals(object obj) => Equals(obj as PivotValues);
    public bool Equals(PivotValues other)
    {
        if (this == other) return true;
        if (other == null) return false;
        var comparer = EqualityComparer<string>.Default;
        for (int i = 0; i < count; i++)
            if (!comparer.Equals(this[i], other[i])) return false;
        return true;
    }
    public int CompareTo(PivotValues other)
    {
        if (this == other) return 0;
        if (other == null) return 1;
        var comparer = Comparer<string>.Default;
        for (int i = 0; i < count; i++)
        {
            var compare = comparer.Compare(this[i], other[i]);
            if (compare != 0) return compare;
        }
        return 0;
    }
    public override string ToString() => string.Join(", ", this); // For debugging
}

基本上,它表示具有相等性和顺序比较语义的string列表的范围(切片).前者允许在数据透视转换期间使用基于哈希的高效LINQ运算符,而后者则允许进行可选排序.同样,此数据结构允许不分配新列表而进行有效转换,同时从JSON反序列化时同时保留实际列表.

Basically it represents a range (slice) of a string list with equality and order comparison semantics. The former allows to use the efficient hash based LINQ operators during the pivot transformation while the later allows optional sorting. Also this data structure allows efficient transformation w/o allocating new lists, at the same time holding the actual lists when deserialized from JSON.

(通过实现IEquatable<PivotValues>接口-GetHashCodeEquals方法提供相等性比较.这样做可以根据内部指定范围内的值将两个PivotValues类实例视为相等输入List<List<string>>List<string>元素.类似地,通过实现IComparable<PivotValues>接口-CompareTo方法来提供排序))

(the equality comparison is provided by implementing IEquatable<PivotValues> interface - GetHashCode and Equals methods. By doing that it allows treating two PivotValues class instances as equal based on the values in specified range inside the List<string> elements of the input List<List<string>>. Similar, the ordering is provided by implementing the IComparable<PivotValues> interface - CompareTo method))

转换本身非常简单:

public static PivotData ToPivot(this List<List<string>> data, int rowDataCount, int columnDataCount, int valueDataCount)
{
    int rowDataStart = 0;
    int columnDataStart = rowDataStart + rowDataCount;
    int valueDataStart = columnDataStart + columnDataCount;

    var columns = data
        .Select(r => new PivotValues(r, columnDataStart, columnDataCount))
        .Distinct()
        .OrderBy(c => c) // Optional
        .ToList();

    var emptyValues = new PivotValues(new string[valueDataCount]); // For missing (row, column) intersection

    var rows = data
        .GroupBy(r => new PivotValues(r, rowDataStart, rowDataCount))
        .Select(rg => new PivotDataRow
        {
            Data = rg.Key,
            Values = columns.GroupJoin(rg,
                c => c,
                r => new PivotValues(r, columnDataStart, columnDataCount),
                (c, vg) => vg.Any() ? new PivotValues(vg.First(), valueDataStart, valueDataCount) : emptyValues
            ).ToList()
        })
        .OrderBy(r => r.Data) // Optional
        .ToList();

    return new PivotData { Columns = columns, Rows = rows };
}

首先,使用简单的LINQ Distinct运算符确定列(标题).然后,通过按行列对源集进行分组来确定行.通过将Columns与分组内容进行外部联接来确定每个行分组内的值.

First the columns (headers) are determined with simple LINQ Distinct operator. Then the rows are determined by grouping the source set by the row columns. The values inside each row grouping are determined by outer joining the Columns with the grouping content.

由于我们采用了数据结构,因此LINQ转换非常有效(对于时间和空间).列和行的顺序是可选的,如果不需要,可以简单地将其删除.

Due to our data structure implementation, the LINQ transformation is quite efficient (for both space and time). The column and row ordering is optional, simple remove it if you don't need it.

使用您的虚拟数据进行示例测试:

Sample test with your dummy data:

var pivotData = data.ToPivot(2, 2, 1);
var json = JsonConvert.SerializeObject(pivotData);
var pivotData2 = JsonConvert.DeserializeObject<PivotData>(json);

这篇关于多列数据转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆