由data.table在多列上长到宽 [英] Long to wide on multiple columns by data.table

查看:65
本文介绍了由data.table在多列上长到宽的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 data.table 融化 dcast 有疑问。 code>用于多列。我在StackOverFlow上浏览,但是很多类似的帖子不是我想要的。我将在下面进行解释。

I have a question about data.table's melt and dcast for multiple columns. I browsed on StackOverFlow but many similar posts are not what I want. I will explain below.

首先,数据涉及问题的原因和价值。这是我的 data 部分:

First, data is about causes of problems and the value amount. Here is my part of data:

ID   Type    Problem1    Value1     Problem2    Value2    Problem3    Value3
1    A       X           500        Y           1000      Z           400
2    A       X           600        Z           700       
3    B       Y           700        Z           100
4    B       W           200        V           200
5    C       Z           500        V           500       
6    C       X           1000       W           100       V           900

第二个, ID 是唯一的。 Type 包含三个( A B C )。存在 5 个问题。

Second, ID is unique. Type contains three(A, B, and C). There are 5 problems.

ID == 1 为例。它是 Type A ,它包含3个问题( X Y Z )。其问题X 具有值500 问题Y 具有值1000 问题Z 的值值400 。以 ID == 5 为例。它是 Type C 并包含2个问题( Z V )。其问题Z 具有值500 ,而问题V 具有值500

Take ID == 1 as an example. It is Type A and it contains 3 problems(X, Y, and Z). Its Problem X has Value 500, Problem Y has Value 1000, Problem Z has Value 400. Take ID == 5 as an example. It is Type C and contains 2 problems(Z and V). Its Problem Z has Value 500 and Problem V has Value 500.

第三列, ID 类型问题1 问题2 问题3 字符 Value1 Value2 Value3 数字

Third, column ID, Type, Problem1, Problem2, and Problem3 are character. Value1, Value2, and Value3 are numeric.

我想要的结果是:

Type    X     Y     Z     W     V
A       1100  1000  1100  0     0   
B       0     700   100   200   200
C       1000  0     500   100   1400  

我不知道如何正确解释。我想将 Type 分组,然后对每个问题的总和进行汇总。我认为这是长到宽。我在此处此处。第二个可能有用。但是,我不知道从哪里开始。

I don't know how to explain here properly. I want to group the Type and then summation each problem's vlaue. I think this is about long to wide. I found reference here and here. Second one may be useful. However, I have no idea where to begin. Any suggestions?

# data
dt <- fread("
ID   Type    Problem1    Value1     Problem2    Value2    Problem3    Value3
1    A       X           500        Y           1000      Z           400
2    A       X           600        Z           700       
3    B       Y           700        Z           100
4    B       W           200        V           200
5    C       Z           500        V           500       
6    C       X           1000       W           100       V           900", fill = T)    


推荐答案

我们首先可以通过在度量模式来<< c $ c>融化 c $ c>转换为'long'格式,然后使用 fun.aggregate 作为<$ c $进行 dcast c> sum

We can first melt by specifying the patterns in measure to 'long' format and then do the dcast with the fun.aggregate as sum

dcast(melt(dt, measure = patterns("^Value", "^Problem"), 
    value.name = c("Value", "Problem"))[Problem != ""
     ][, Problem := factor(Problem, levels = c("X", "Y", "Z", "W", "V"))], 
     Type ~Problem, value.var = "Value", sum, na.rm = TRUE)
#   Type    X    Y    Z   W    V
#1:    A 1100 1000 1100   0    0
#2:    B    0  700  100 200  200
#3:    C 1000    0  500 100 1400

融化 $ c> data.table 可以在 measure 参数中采用多个模式。因此,当我们说 ^ Value 时,它将匹配所有名称以( ^ )开头的列值,并类似地为问题创建两个值列。在上面,我们使用 value.name 参数将这些列命名为 Value和 Problem。由于数据集有一些空白,因此长格式也包含空白元素,我们可以通过 Problem!= 删除它们。仅当我们需要按特定顺序排列列时,下一步才重要。因此,我们将问题更改为 factor 类,并按顺序指定级别。现在,融化部分已完成。现在,通过指定公式, value.var 列和< dcast ,将长格式更改为宽 code> fun.aggregate (此处为总和

melt from data.table can take multiple patterns in the measure argument. So, when we say "^Value" it matches all the columns that have names that start (^) with "Value" and similarly for "Problem" and create two 'value' columns. In the above, we are naming those columns as 'Value' and 'Problem' with value.name argument. As the dataset having some blanks, the long format also have the blank elements which we remove with Problem != "". The next step is only important if we need to have the columns in a specific order. So, we change the 'Problem' to factor class and specified the levels in that order. Now, the melt part is completed. The long format is now changed to 'wide' with dcast by specifying the formula, value.var column and the fun.aggregate (here it is sum)

这篇关于由data.table在多列上长到宽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆