Crosstabs与R中的data.table [英] Crosstabs with data.table in R
问题描述
我喜欢R中的data.table包,我认为它可以帮助我执行复杂的交叉制表任务,但没有想出如何使用包执行类似 table
。
以下是一些复制调查数据:
ID < - c(resp1,resp2,resp3,resp4)
party< - c(GOP,GOP,Democraticat,GOP)
df< - data.frame $ b
在表中,计算参与者的意见数是简单的 table $ opinion,df $ party)。
我已经设法在data.table中做类似的事情,但结果是笨重添加一个单独的列。
dt< - data.table(df)
dt [,.N,by =party]
在data.table中有一些分组操作,和复杂的调查数据交叉表,但我还没有找到任何教程如何做到。非常感谢您的帮助。
我们可以使用 dcast
c $ c> data.table (请参阅 Efficient reshaping using data.tables vignette。 table / wiki / Getting-startedrel =nofollow> project wiki 或 CRAN项目页面)。
dcast .var ='ID',length)
基准
$ b b
如果我们使用稍大的数据集,并使用 dcast
从 reshape2
和 data.table
set.seed(24)
df< data.frame(ID = 1:1e6,opinion = sample(letters,1e6,replace = TRUE),
party = sample(1:9,1e6,replace = TRUE))
system.time $ d
系统时间(dcast(setDT(df),意见〜party,value.var ='ID',length))
#用户系统已过
#0.022 0.000 0.023
system.time(setDT(df) N,by =。(opinion,party)])
#用户系统已过
#0.018 0.001 0.018
第三个选项稍微好一点,但它是'long'格式。如果OP想要一个宽格式,可以使用 data.table
dcast
。
注意:我使用的是devel版本即 v1.9.7
,但CRAN应该足够快。
I love the data.table package in R, and I think it could help me perform sophisticated cross tabulation tasks, but haven't figured out how to use the package to do tasks similar to table
.
Here's some replication survey data:
opinion <- c("gov", "market", "gov", "gov")
ID <- c("resp1", "resp2", "resp3", "resp4")
party <- c("GOP", "GOP", "democrat", "GOP")
df <- data.frame(ID, opinion, party)
In tables, counting the number of opinions by party is as simple as table(df$opinion, df$party).
I've managed to do something similar in data.table, but the result is clunky and it adds a separate column.
dt <- data.table(df)
dt[, .N, by="party"]
There's a number of grouping operations in data.table that could be great for fast and sophisticated crosstabs of survey data, but i haven't found any tutorials on how to it. Thanks for any help.
We can use dcast
from data.table
(See the Efficient reshaping using data.tables vignette on the project wiki or on the CRAN project page).
dcast(dt, opinion~party, value.var='ID', length)
Benchmarks
If we use a slightly bigger dataset and compare the speed using dcast
from reshape2
and data.table
set.seed(24)
df <- data.frame(ID=1:1e6, opinion=sample(letters, 1e6, replace=TRUE),
party= sample(1:9, 1e6, replace=TRUE))
system.time(dcast(df, opinion ~ party, value.var='ID', length))
# user system elapsed
# 0.278 0.013 0.293
system.time(dcast(setDT(df), opinion ~ party, value.var='ID', length))
# user system elapsed
# 0.022 0.000 0.023
system.time(setDT(df)[, .N, by = .(opinion, party)])
# user system elapsed
# 0.018 0.001 0.018
The third option is slightly better but it is in 'long' format. If the OP wants to have a 'wide' format, the data.table
dcast
can be used.
NOTE: I am using the the devel version i.e. v1.9.7
, but the CRAN should be fast enough.
这篇关于Crosstabs与R中的data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!