Excel UDF 对表中的数据块进行逆透视(熔化、反向透视、展平、标准化) [英] Excel UDF to Unpivot (Melt, Reverse pivot, Flatten, Normalize) blocks of data within Tables
问题描述
这个问题会寻求多种方法LET/LAMBDA
VBA UDF
和Power Query Function
,所以不会有单一的正确答案,但是用作参考的方法征集.
This question will seek multiple approaches LET/LAMBDA
VBA UDF
and Power Query Function
, so there will be no single right answer, but a solicitation of approaches to be used as references.
Scott raised a question here about unpivoting a complex table that contains blocks of data instead of individual data points. The basic idea is illustrated in this table:
Jan | Jan | Jan | Jan | Feb | Feb | Feb | Feb | Mar | Mar | Mar | Mar | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
State | City | Pressure | Temp | Humidity | CO2 | Pressure | Temp | Humidity | CO2 | Pressure | Temp | Humidity | CO2 |
Georgia | Atlanta | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
Massachusetts | Boston | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 |
Texas | Dallas | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 |
Louisiana | Jonesboro | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 |
California | San Francisco | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 |
The data for each city is in blocks of four columns containing Pressure, Temperature, Humidity and CO2 (or PTHC). We want to unpivot the PTHC blocks of values according to their month by the State and City. Here is the desired output:
State | City | month | Pressure | Temp | Humidity | CO2 |
---|---|---|---|---|---|---|
Georgia | Atlanta | Jan | 1 | 2 | 3 | 4 |
Georgia | Atlanta | Feb | 5 | 6 | 7 | 8 |
Georgia | Atlanta | Mar | 9 | 10 | 11 | 12 |
Massachusetts | Boston | Jan | 49 | 50 | 51 | 52 |
Massachusetts | Boston | Feb | 53 | 54 | 55 | 56 |
Massachusetts | Boston | Mar | 57 | 58 | 59 | 60 |
Texas | Dallas | Jan | 97 | 98 | 99 | 100 |
Texas | Dallas | Feb | 101 | 102 | 103 | 104 |
Texas | Dallas | Mar | 105 | 106 | 107 | 108 |
Louisiana | Jonesboro | Jan | 145 | 146 | 147 | 148 |
Louisiana | Jonesboro | Feb | 149 | 150 | 151 | 152 |
Louisiana | Jonesboro | Mar | 153 | 154 | 155 | 156 |
California | San Francisco | Jan | 193 | 194 | 195 | 196 |
California | San Francisco | Feb | 197 | 198 | 199 | 200 |
California | San Francisco | Mar | 201 | 202 | 203 | 204 |
The order of the rows is not important, so long as they are complete - i.e. the output could be sorted by month, city, state, ... it does not matter. The output does not need to be a dynamic array that spills - i.e. in the case of a Power Query function, it clearly would not be.
It can be assumed that the PTHC block is always consistent, i.e.
- it never skips a field value, e.g. PTHC PTC PTHC...
- it never changes order, e.g. PTHC PCHT
The months are always presented in groups that are equally sized to the block (in this example, 4, so there will be four Jan columns, Feb columns, etc.). e.g. if there are 7 months, there will be 7 PTHC blocks or 28 columns of data.
However, the pattern of months can also be interleaved such that the months will increment and the PTHC block will be grouped (i.e. PPP TTT HHH CCC) like this:
Jan | Feb | Mar | Jan | Feb | Mar | Jan | Feb | Mar | Jan | Feb | Mar | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
State | City | Pressure | Pressure | Pressure | Temp | Temp | Temp | Humidity | Humidity | Humidity | CO2 | CO2 | CO2 |
The UDF would also have to accommodate more or less than 4 fields inside the block. The use of Months and PTHC are just illustrations, the attribute that represents months in this example will always be a single row (although a multi-row approach would be an interesting question - but a new and separate one). The attribute that represents the field values PTHC will also be a single row.
I will propose a LET function based on Scott's question, but there certainly can be better approaches and both VBA and Power Query have their own strengths. The objective is to create a collection of working approaches.
Powerquery version. A bit longer code to accommodate possibility of AAAABBBB instead of ABABABAB
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
// list of months
#"Unpivoted Other Columns" = List.Repeat(Table.UnpivotOtherColumns(Table.FirstN(Source,1), {"Column1", "Column2"}, "Attribute", "Value")[Value],Table.RowCount(Source)-2),
#"Converted to Table" = Table.AddIndexColumn(Table.FromList(#"Unpivoted Other Columns", Splitter.SplitByNothing(), null, null, ExtraValues.Error), "Index", 0, 1),
// list of PTHC
#"Unpivoted Other Columns2" = List.Repeat(Table.UnpivotOtherColumns(Table.FirstN(Table.Skip(Source,1) ,1), {"Column1", "Column2"}, "Attribute", "Value")[Value],Table.RowCount(Source)-2),
#"Converted to Table2" = Table.AddIndexColumn(Table.FromList(#"Unpivoted Other Columns2", Splitter.SplitByNothing(), null, null, ExtraValues.Error), "Index", 0, 1),
// all other data
#"Unpivoted Other Columns1" = Table.UnpivotOtherColumns(Table.Skip(Source,2), {"Column1", "Column2"}, "Attribute", "Value"),
#"Added Index" = Table.AddIndexColumn(#"Unpivoted Other Columns1", "Index", 0, 1),
// merge in months and PTHC
#"Merged Queries" = Table.NestedJoin(#"Added Index",{"Index"},#"Converted to Table",{"Index"},"X1",JoinKind.LeftOuter),
#"Merged Queries2" = Table.NestedJoin(#"Merged Queries" ,{"Index"},#"Converted to Table2",{"Index"},"X2",JoinKind.LeftOuter),
#"Expanded X1" = Table.ExpandTableColumn(#"Merged Queries2", "X1", {"Column1"}, {"Month"}),
#"Expanded X2" = Table.ExpandTableColumn(#"Expanded X1", "X2", {"Column1"}, {"Type"}),
//extra work to pivot in correct format
#"Renamed Columns" = Table.RenameColumns(#"Expanded X2",{{"Column1", "State"}, {"Column2", "City"}}),
#"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Attribute","Index"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns",{{"State", Order.Ascending}, {"City", Order.Ascending}, {"Month", Order.Ascending}, {"Type", Order.Ascending}}),
#"Added Index1" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
TypeCount=List.Count(List.Distinct(#"Added Index1"[Type])),
#"Integer-Divided Column" = Table.TransformColumns(#"Added Index1", {{"Index", each Number.IntegerDivide(_, TypeCount), Int64.Type}}),
#"Pivoted Column" = Table.Pivot(#"Integer-Divided Column", List.Distinct(#"Integer-Divided Column"[Type]), "Type", "Value"),
#"Removed Columns1" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns1"
这篇关于Excel UDF 对表中的数据块进行逆透视(熔化、反向透视、展平、标准化)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!