通过多种因素对表格进行分组并将其从长格式扩展到宽格式-R中的data.table方式 [英] grouping table by multiple factors and spreading it from long format to wide - the data.table way in R

查看:93
本文介绍了通过多种因素对表格进行分组并将其从长格式扩展到宽格式-R中的data.table方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为示例,我将使用R中可用的 mtcars 数据:

As an example i will be using the mtcars data available in R:

data(mtcars)
setDT(mtcars)

让我想将数据按三个变量进行分组,分别是: carb cyl 齿轮。我已经做到了如下。但是,我确信有更好的方法,因为这是重复性的。

Lets day I want to group the data by three variables, namely: carb, cyl, and gear. I have done this as follow. However, i am sure there is a better way, as this is quite repetitive.

newDTcars <- mtcars [, mtcars[, mtcars[, .N , by = carb], by = cyl], by= gear]

第二,我希望数据具有较宽的格式,每个 gear 级别都有一个单独的列。为了便于说明,我使用 tidyr 进行了此操作,但是我想使用 data.table方法进行此操作。

Secondly, I would like to have the data in a wide format, where there is a separate column for every gear level. For illustration purpose I have done this using tidyr, however i would like to have this done the "data.table" way.

newDTcars %>% tidyr::spread(gear, N)

这个问题的重点是继续解决data.table世界,因为我也想进一步了解 data。表格

The emphasis of this question is to keep to solution to the data.table world, as i would like too learn more about data.table.

推荐答案

data.table 中,我们可以按多列分组要重塑,我们可以使用 dcast

In data.table, we can group by multiple columns and to reshape we can use dcast.

library(data.table)
dcast(mtcars[, .N, .(carb, cyl, gear)], carb+cyl~gear, value.var = "N")

#   carb cyl  3  4  5
#1:    1   4  1  4 NA
#2:    1   6  2 NA NA
#3:    2   4 NA  4  2
#4:    2   8  4 NA NA
#5:    3   8  3 NA NA
#6:    4   6 NA  4 NA
#7:    4   8  5 NA  1
#8:    6   6 NA NA  1
#9:    8   8 NA NA  1

您可以在 dcast 中使用 fill 参数将 NA 替换为0或任何其他数字。

You may use fill argument in dcast to replace NAs with 0 or any other number.

这篇关于通过多种因素对表格进行分组并将其从长格式扩展到宽格式-R中的data.table方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆