R + reshape:数据框的列的方差 [英] R + reshape : variance of columns of a data.frame
问题描述
> df
aabb ID
1 1 1 1 1 1
2 2 3 2 3 2
3 3 5 3 5 3
这只是一个小小的测试数据框架来尝试和理解重塑包。我融化,然后投下,尝试找到 a
s和 b
s的平均值: p>
> melt(df,id =ID) - > df.m
> (df.m,ID〜variable,fun = mean)
ID ab
1 1 1 1
2 2 2
3 3 3 3
Argh!什么?希望 c(2,3)
的平均值为2.5,依此类推。这是怎么回事?这是一件事:
> df.m
ID变量值
1 1 a 1
2 2 a 2
3 3 a 3
4 1 a 1
5 2 a 2
6 3 a 3
7 1 b 1
8 2 b 2
9 3 b 3
10 1 b 1
11 2 b 2
12 3 b 3
发生了什么?我的 5
去哪里了?我在这里有一个非常基本的误会吗?如果是这样,那是什么?
我更新了我的答案来解决这个问题:
R:data.frame的汇总列
显然,如果您的数据框没有唯一的列名,它们将无法正常熔化。
编辑:
而不是列名称为 aaabb
,显然您需要为 melt()
设置唯一的列名称才能正常工作。最小 a.1 a.2 a.3 b.1 b.2
或某事。使用 melt()
后,您可以使用变量
获得合理级别的选项是使用 gsub()
在变量
的级别,以消除歧义值,或使用 colsplit()
创建两个新列。对于刚刚给出的虚拟名称,这将是:
levels(df.m $ variable)< - gsub( \\ .. *,,levels(df.m $ variable))
#or
df.m< - cbind(df.m,colsplit(df.m $变量,split =\\。,names = c(Measure,N)))
I'm using reshape in R to compute aggregate statistics over columns of a data.frame. Here's my data.frame:
> df
a a b b ID
1 1 1 1 1 1
2 2 3 2 3 2
3 3 5 3 5 3
which is just a little test data.frame to try and understand the reshape package. I melt, and then cast, to try and find the mean of the a
s and the b
s:
> melt(df, id = "ID") -> df.m
> cast(df.m, ID ~ variable, fun = mean)
ID a b
1 1 1 1
2 2 2 2
3 3 3 3
Argh! What? Was hoping the mean of c(2,3)
was 2.5 and so on. What's going on? Here's a thing:
> df.m
ID variable value
1 1 a 1
2 2 a 2
3 3 a 3
4 1 a 1
5 2 a 2
6 3 a 3
7 1 b 1
8 2 b 2
9 3 b 3
10 1 b 1
11 2 b 2
12 3 b 3
what's going on? Where did both my 5
s go? Do I have a very basic misunderstanding going on here? If so: what is it?
I updated my answer here to fix this: R: aggregate columns of a data.frame
Apparently, if your data frame doesn't have unique column names, they won't melt properly.
Edit:
Instead of having column names of a a a b b
, apparently you need to have unique column names for melt()
to work properly. Minimally a.1 a.2 a.3 b.1 b.2
, or something. After using melt()
, your options to get sensible levels for variable
is either to use gsub()
on the levels of variable
to eliminate the disambiguating values, or to use colsplit()
to create two new columns. For the dummy names I just gave, that would look like:
levels(df.m$variable) <- gsub("\\..*", "", levels(df.m$variable))
#or
df.m <- cbind(df.m, colsplit(df.m$variable, split = "\\.", names = c("Measure","N")))
这篇关于R + reshape:数据框的列的方差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!