列出R中的数据框 [英] Tabulate a data frame in R
问题描述
我想将数据制成表格,以便因子变量成为列,并保留单元格中另一个变量的值。
I wanted to tabulate data so that a factor variable becomes columns and keep value from another variable in cell.
所以我尝试了
a=rep(1:3,3)
d<-rep(1:3, each=3)
b=rnorm(9)
c=runif(9)
dt<-data.frame(a,d,b,c)
a d b c
1 1 1 0.3819762 0.5199602
2 2 1 0.3896063 0.9144730
3 3 1 2.4356972 0.2888464
4 1 2 1.2697016 0.9831191
5 2 2 -1.9844689 0.2046947
6 3 2 0.3473766 0.4766178
7 1 3 -1.5461235 0.6187189
8 2 3 1.0829027 0.9089551
9 3 3 -0.1305324 0.6326141
我在找 data.table
, plyr
, reshape2
,但找不到我想做的事。因此,我采用了旧的循环方式。
I looked for data.table
, plyr
, reshape2
but could not find what I wanted to do. So, I did the old loop way.
mat<-matrix(NA, nrow=3, ncol=4)
for (i in 1:3){
mat[i,1]<-i
for (j in 1:3){
val=dt[a==i & d==j,3]
mat[i,j+1]<-val
}
}
mat
[,1] [,2] [,3] [,4]
[1,] 1 0.3819762 1.2697016 -1.5461235
[2,] 2 0.3896063 -1.9844689 1.0829027
[3,] 3 2.4356972 0.3473766 -0.1305324
...并且大数据需要永远的时间。
... and it takes forever for big data.
任何更好的选择?
推荐答案
基本R也为:
reshape(dt,timevar="d",idvar="a",drop="c",direction="wide")
对于您的数据,这给出了...
For your data, this gives...
a b.1 b.2 b.3
1 1 0.3819762 1.2697016 -1.5461235
2 2 0.3896063 -1.9844689 1.0829027
3 3 2.4356972 0.3473766 -0.1305324
请使用 set.seed
在绘制模拟数据之前,以便更容易重现。
Please use set.seed
before drawing simulated data, so that it is easier to reproduce.
我不知道这种解决方案会很快。另外,要在将来使用它,您必须习惯于这些令人困惑的参数名称( timevar, idvar等),这些名称可能并不能描述您大多数时间实际上在做什么。
I don't know that this solution will be fast. Also, to use it in the future, you have to get used to these confusing argument names ("timevar", "idvar", etc.) which probably don't describe what you're actually doing most of the time...
这篇关于列出R中的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!