R + reshape:数据框的列的方差 [英] R + reshape : variance of columns of a data.frame

查看:458
本文介绍了R + reshape:数据框的列的方差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中使用reshape来计算数据框架中的统计信息。这是我的data.frame:

 > df 
aabb ID
1 1 1 1 1 1
2 2 3 2 3 2
3 3 5 3 5 3

这只是一个小小的测试数据框架来尝试和理解重塑包。我融化,然后投下,尝试找到 a s和 b s的平均值: p>

 > melt(df,id =ID) - > df.m 
> (df.m,ID〜variable,fun = mean)
ID ab
1 1 1 1
2 2 2
3 3 3 3

Argh!什么?希望 c(2,3)的平均值为2.5,依此类推。这是怎么回事?这是一件事:

 > df.m 
ID变量值
1 1 a 1
2 2 a 2
3 3 a 3
4 1 a 1
5 2 a 2
6 3 a 3
7 1 b 1
8 2 b 2
9 3 b 3
10 1 b 1
11 2 b 2
12 3 b 3

发生了什么?我的 5 去哪里了?我在这里有一个非常基本的误会吗?如果是这样,那是什么?

解决方案

我更新了我的答案来解决这个问题:
R:data.frame的汇总列



显然,如果您的数据框没有唯一的列名,它们将无法正常熔化。



编辑:
而不是列名称为 aaabb ,显然您需要为 melt()设置唯一的列名称才能正常工作。最小 a.1 a.2 a.3 b.1 b.2 或某事。使用 melt()后,您可以使用变量获得合理级别的选项是使用 gsub()变量的级别,以消除歧义值,或使用 colsplit()创建两个新列。对于刚刚给出的虚拟名称,这将是:

  levels(df.m $ variable)<  -  gsub( \\ .. *,,levels(df.m $ variable))
#or
df.m< - cbind(df.m,colsplit(df.m $变量,split =\\。,names = c(Measure,N)))


I'm using reshape in R to compute aggregate statistics over columns of a data.frame. Here's my data.frame:

> df
  a a b b ID
1 1 1 1 1  1
2 2 3 2 3  2
3 3 5 3 5  3

which is just a little test data.frame to try and understand the reshape package. I melt, and then cast, to try and find the mean of the as and the bs:

> melt(df, id = "ID") -> df.m
> cast(df.m, ID ~ variable, fun = mean)
  ID a b
1  1 1 1
2  2 2 2
3  3 3 3

Argh! What? Was hoping the mean of c(2,3) was 2.5 and so on. What's going on? Here's a thing:

> df.m
   ID variable value
1   1        a     1
2   2        a     2
3   3        a     3
4   1        a     1
5   2        a     2
6   3        a     3
7   1        b     1
8   2        b     2
9   3        b     3
10  1        b     1
11  2        b     2
12  3        b     3

what's going on? Where did both my 5s go? Do I have a very basic misunderstanding going on here? If so: what is it?

解决方案

I updated my answer here to fix this: R: aggregate columns of a data.frame

Apparently, if your data frame doesn't have unique column names, they won't melt properly.

Edit: Instead of having column names of a a a b b, apparently you need to have unique column names for melt() to work properly. Minimally a.1 a.2 a.3 b.1 b.2, or something. After using melt(), your options to get sensible levels for variable is either to use gsub() on the levels of variable to eliminate the disambiguating values, or to use colsplit() to create two new columns. For the dummy names I just gave, that would look like:

levels(df.m$variable) <- gsub("\\..*", "", levels(df.m$variable))
#or
df.m <- cbind(df.m, colsplit(df.m$variable, split = "\\.", names = c("Measure","N")))

这篇关于R + reshape:数据框的列的方差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆