Crosstabs与R中的data.table [英] Crosstabs with data.table in R

查看:172
本文介绍了Crosstabs与R中的data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我喜欢R中的data.table包,我认为它可以帮助我执行复杂的交叉制表任务,但没有想出如何使用包执行类似 table



以下是一些复制调查数据:

  ID < -  c(resp1,resp2,resp3,resp4) 
party< - c(GOP,GOP,Democraticat,GOP)

df< - data.frame $ b

在表中,计算参与者的意见数是简单的 table $ opinion,df $ party)。



我已经设法在data.table中做类似的事情,但结果是笨重添加一个单独的列。

  dt<  -  data.table(df)
dt [,.N,by =party]

在data.table中有一些分组操作,和复杂的调查数据交叉表,但我还没有找到任何教程如何做到。非常感谢您的帮助。

解决方案

我们可以使用 dcast c $ c> data.table (请参阅 Efficient reshaping using data.tables vignette。 table / wiki / Getting-startedrel =nofollow> project wiki CRAN项目页面)。

  dcast .var ='ID',length)



基准


$ b b

如果我们使用稍大的数据集,并使用 dcast reshape2 data.table

  set.seed(24)
df< data.frame(ID = 1:1e6,opinion = sample(letters,1e6,replace = TRUE),
party = sample(1:9,1e6,replace = TRUE))
system.time $ d


系统时间(dcast(setDT(df),意见〜party,value.var ='ID',length))
#用户系统已过
#0.022 0.000 0.023

system.time(setDT(df) N,by =。(opinion,party)])
#用户系统已过
#0.018 0.001 0.018

第三个选项稍微好一点,但它是'long'格式。如果OP想要一个宽格式,可以使用 data.table dcast



注意:我使用的是devel版本即 v1.9.7 ,但CRAN应该足够快。


I love the data.table package in R, and I think it could help me perform sophisticated cross tabulation tasks, but haven't figured out how to use the package to do tasks similar to table.

Here's some replication survey data:

opinion <- c("gov", "market", "gov", "gov")
ID <- c("resp1", "resp2", "resp3", "resp4")
party <- c("GOP", "GOP", "democrat", "GOP")

df <- data.frame(ID, opinion, party)

In tables, counting the number of opinions by party is as simple as table(df$opinion, df$party).

I've managed to do something similar in data.table, but the result is clunky and it adds a separate column.

dt <- data.table(df)
dt[, .N, by="party"]

There's a number of grouping operations in data.table that could be great for fast and sophisticated crosstabs of survey data, but i haven't found any tutorials on how to it. Thanks for any help.

解决方案

We can use dcast from data.table (See the Efficient reshaping using data.tables vignette on the project wiki or on the CRAN project page).

dcast(dt, opinion~party, value.var='ID', length)

Benchmarks

If we use a slightly bigger dataset and compare the speed using dcast from reshape2 and data.table

set.seed(24)
df <- data.frame(ID=1:1e6, opinion=sample(letters, 1e6, replace=TRUE),
  party= sample(1:9, 1e6, replace=TRUE))
system.time(dcast(df, opinion ~ party, value.var='ID', length))
#   user  system elapsed 
#  0.278   0.013   0.293 
system.time(dcast(setDT(df), opinion ~ party, value.var='ID', length))
#   user  system elapsed 
# 0.022   0.000   0.023 

system.time(setDT(df)[, .N, by = .(opinion, party)])
#  user  system elapsed 
# 0.018   0.001   0.018 

The third option is slightly better but it is in 'long' format. If the OP wants to have a 'wide' format, the data.table dcast can be used.

NOTE: I am using the the devel version i.e. v1.9.7, but the CRAN should be fast enough.

这篇关于Crosstabs与R中的data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆