对于每个组,汇总数据帧中所有变量的均值(ddply?split?) [英] for each group summarise means for all variables in dataframe (ddply? split?)
问题描述
一周前,我将手动完成此操作:将数据帧按组子集划分为新的数据帧.对于每个数据帧,为每个变量计算均值,然后rbind.非常笨重...
A week ago I would have done this manually: subset dataframe by group to new dataframes. For each dataframe compute means for each variables, then rbind. very clunky ...
现在,我已经了解了split
和plyr
,并且我想一定有一种使用这些工具的简便方法.请不要证明我错了.
Now i have learned about split
and plyr
, and I guess there must be an easier way using these tools. Please don't prove me wrong.
test_data <- data.frame(cbind(
var0 = rnorm(100),
var1 = rnorm(100,1),
var2 = rnorm(100,2),
var3 = rnorm(100,3),
var4 = rnorm(100,4),
group = sample(letters[1:10],100,replace=T),
year = sample(c(2007,2009),100, replace=T)))
test_data$var1 <- as.numeric(as.character(test_data$var1))
test_data$var2 <- as.numeric(as.character(test_data$var2))
test_data$var3 <- as.numeric(as.character(test_data$var3))
test_data$var4 <- as.numeric(as.character(test_data$var4))
我俩都在玩弄ddply
,但我无法产生我想要的东西-即每个组这样的表
I am toying with both ddply
but I can't produce what I desire - i.e. a table like this, for each group
group a |2007|2009|
________|____|____|
var1 | xx | xx |
var2 | xx | xx |
etc. | etc| ect|
也许是d_ply
,某些odfweave
输出也可以工作.非常感谢您的投入.
maybe d_ply
and some odfweave
output would work to. Inputs are very much appreciated.
p.s.我注意到data.frame将rmrm转换为data.frame中的因子吗?如何避免这种情况-I(rnorm(100)无效,因此我必须像上面那样转换为数字
p.s. I notice that data.frame converts the rnorm to factors in my data.frame? how can I avoid this - I(rnorm(100) doesn't work so I have to convert to numerics as done above
推荐答案
鉴于结果所需的格式,重塑软件包将比plyr更有效率.
Given the format you want for the result, the reshape package will be more efficient than plyr.
test_data <- data.frame(
var0 = rnorm(100),
var1 = rnorm(100,1),
var2 = rnorm(100,2),
var3 = rnorm(100,3),
var4 = rnorm(100,4),
group = sample(letters[1:10],100,replace=T),
year = sample(c(2007,2009),100, replace=T))
library(reshape)
Molten <- melt(test_data, id.vars = c("group", "year"))
cast(group + variable ~ year, data = Molten, fun = mean)
结果看起来像这样
group variable 2007 2009
1 a var0 0.003767891 0.340989068
2 a var1 2.009026385 1.162786943
3 a var2 1.861061882 2.676524736
4 a var3 2.998011426 3.311250399
5 a var4 3.979255971 4.165715967
6 b var0 -0.112883844 -0.179762343
7 b var1 1.342447279 1.199554144
8 b var2 2.486088196 1.767431740
9 b var3 3.261451449 2.934903824
10 b var4 3.489147597 3.076779626
11 c var0 0.493591055 -0.113469315
12 c var1 0.157424796 -0.186590644
13 c var2 2.366594176 2.458204041
14 c var3 3.485808031 2.817153628
15 c var4 3.681576886 3.057915666
16 d var0 0.360188789 1.205875725
17 d var1 1.271541181 0.898973536
18 d var2 1.824468264 1.944708165
19 d var3 2.323315162 3.550719308
20 d var4 3.852223640 4.647498956
21 e var0 -0.556751465 0.273865769
22 e var1 1.173899189 0.719520372
23 e var2 1.935402724 2.046313047
24 e var3 3.318669590 2.871462470
25 e var4 4.374478734 4.522511874
26 f var0 -0.258956555 -0.007729091
27 f var1 1.424479454 1.175242755
28 f var2 1.797948551 2.411030282
29 f var3 3.083169793 3.324584667
30 f var4 4.160641429 3.546527820
31 g var0 0.189038036 -0.683028110
32 g var1 0.429915866 0.827761101
33 g var2 1.839982321 1.513104866
34 g var3 3.106414330 2.755975622
35 g var4 4.599340239 3.691478466
36 h var0 0.015557352 -0.707257185
37 h var1 0.933199148 1.037655156
38 h var2 1.927442457 2.521369108
39 h var3 3.246734239 3.703213646
40 h var4 4.242387776 4.407960355
41 i var0 0.885226638 -0.288221276
42 i var1 1.216012653 1.502514588
43 i var2 2.302815441 1.905731471
44 i var3 2.026631277 2.836508446
45 i var4 4.800676814 4.772964668
46 j var0 -0.435661855 0.192703997
47 j var1 0.836814185 0.394505861
48 j var2 1.663523873 2.377640369
49 j var3 3.489536343 3.457597835
50 j var4 4.146020948 4.281599816
这篇关于对于每个组,汇总数据帧中所有变量的均值(ddply?split?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!