由data.table在多列上长到宽 [英] Long to wide on multiple columns by data.table
问题描述
我对 data.table
的融化
和 dcast 有疑问。 code>用于多列。我在StackOverFlow上浏览,但是很多类似的帖子不是我想要的。我将在下面进行解释。
I have a question about data.table
's melt
and dcast
for multiple columns. I browsed on StackOverFlow but many similar posts are not what I want. I will explain below.
首先,数据
涉及问题的原因和价值。这是我的 data
部分:
First, data
is about causes of problems and the value amount. Here is my part of data
:
ID Type Problem1 Value1 Problem2 Value2 Problem3 Value3
1 A X 500 Y 1000 Z 400
2 A X 600 Z 700
3 B Y 700 Z 100
4 B W 200 V 200
5 C Z 500 V 500
6 C X 1000 W 100 V 900
第二个, ID
是唯一的。 Type
包含三个( A
, B
和 C
)。存在 5 个问题。
Second, ID
is unique. Type
contains three(A
, B
, and C
). There are 5 problems.
以 ID == 1
为例。它是 Type A
,它包含3个问题( X
, Y
和 Z
)。其问题X
具有值500
,问题Y
具有值1000
,问题Z
的值值400
。以 ID == 5
为例。它是 Type C
并包含2个问题( Z
和 V
)。其问题Z
具有值500
,而问题V
具有值500
。
Take ID == 1
as an example. It is Type A
and it contains 3 problems(X
, Y
, and Z
). Its Problem X
has Value 500
, Problem Y
has Value 1000
, Problem Z
has Value 400
. Take ID == 5
as an example. It is Type C
and contains 2 problems(Z
and V
). Its Problem Z
has Value 500
and Problem V
has Value 500
.
第三列, ID
,类型
,问题1
,问题2
和问题3
是字符
。 Value1
, Value2
和 Value3
是数字
。
Third, column ID
, Type
, Problem1
, Problem2
, and Problem3
are character
. Value1
, Value2
, and Value3
are numeric
.
我想要的结果是:
Type X Y Z W V
A 1100 1000 1100 0 0
B 0 700 100 200 200
C 1000 0 500 100 1400
我不知道如何正确解释。我想将 Type
分组,然后对每个问题的总和进行汇总。我认为这是长到宽。我在此处和此处。第二个可能有用。但是,我不知道从哪里开始。
I don't know how to explain here properly. I want to group the Type
and then summation each problem's vlaue. I think this is about long to wide. I found reference here and here. Second one may be useful. However, I have no idea where to begin. Any suggestions?
# data
dt <- fread("
ID Type Problem1 Value1 Problem2 Value2 Problem3 Value3
1 A X 500 Y 1000 Z 400
2 A X 600 Z 700
3 B Y 700 Z 100
4 B W 200 V 200
5 C Z 500 V 500
6 C X 1000 W 100 V 900", fill = T)
推荐答案
我们首先可以通过在度量$中指定
c $ c>转换为'long'格式,然后使用模式
来<< c $ c>融化 fun.aggregate
作为<$ c $进行 dcast
c> sum
We can first melt
by specifying the patterns
in measure
to 'long' format and then do the dcast
with the fun.aggregate
as sum
dcast(melt(dt, measure = patterns("^Value", "^Problem"),
value.name = c("Value", "Problem"))[Problem != ""
][, Problem := factor(Problem, levels = c("X", "Y", "Z", "W", "V"))],
Type ~Problem, value.var = "Value", sum, na.rm = TRUE)
# Type X Y Z W V
#1: A 1100 1000 1100 0 0
#2: B 0 700 100 200 200
#3: C 1000 0 500 100 1400
融化
$ c> data.table 可以在 measure
参数中采用多个模式
。因此,当我们说 ^ Value
时,它将匹配所有名称以( ^
)开头的列值,并类似地为问题创建两个值列。在上面,我们使用 value.name
参数将这些列命名为 Value和 Problem。由于数据集有一些空白,因此长格式也包含空白元素,我们可以通过 Problem!=
删除它们。仅当我们需要按特定顺序排列列时,下一步才重要。因此,我们将问题更改为 factor
类,并按顺序指定级别
。现在,融化
部分已完成。现在,通过指定公式, value.var
列和< dcast
,将长格式更改为宽 code> fun.aggregate (此处为总和
)
melt
from data.table
can take multiple patterns
in the measure
argument. So, when we say "^Value"
it matches all the columns that have names that start (^
) with "Value" and similarly for "Problem" and create two 'value' columns. In the above, we are naming those columns as 'Value' and 'Problem' with value.name
argument. As the dataset having some blanks, the long format also have the blank elements which we remove with Problem != ""
. The next step is only important if we need to have the columns in a specific order. So, we change the 'Problem' to factor
class and specified the levels
in that order. Now, the melt
part is completed. The long format is now changed to 'wide' with dcast
by specifying the formula, value.var
column and the fun.aggregate
(here it is sum
)
这篇关于由data.table在多列上长到宽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!