通过多种因素对表格进行分组并将其从长格式扩展到宽格式-R中的data.table方式 [英] grouping table by multiple factors and spreading it from long format to wide - the data.table way in R
问题描述
作为示例,我将使用R中可用的 mtcars
数据:
As an example i will be using the mtcars
data available in R:
data(mtcars)
setDT(mtcars)
让我想将数据按三个变量进行分组,分别是: carb
, cyl
和齿轮
。我已经做到了如下。但是,我确信有更好的方法,因为这是重复性的。
Lets day I want to group the data by three variables, namely: carb
, cyl
, and gear
. I have done this as follow. However, i am sure there is a better way, as this is quite repetitive.
newDTcars <- mtcars [, mtcars[, mtcars[, .N , by = carb], by = cyl], by= gear]
第二,我希望数据具有较宽的格式,每个 gear
级别都有一个单独的列。为了便于说明,我使用 tidyr
进行了此操作,但是我想使用 data.table方法进行此操作。
Secondly, I would like to have the data in a wide format, where there is a separate column for every gear
level. For illustration purpose I have done this using tidyr
, however i would like to have this done the "data.table" way.
newDTcars %>% tidyr::spread(gear, N)
这个问题的重点是继续解决data.table世界,因为我也想进一步了解 data。表格
。
The emphasis of this question is to keep to solution to the data.table world, as i would like too learn more about data.table
.
推荐答案
在 data.table
中,我们可以按多列分组要重塑,我们可以使用 dcast
。
In data.table
, we can group by multiple columns and to reshape we can use dcast
.
library(data.table)
dcast(mtcars[, .N, .(carb, cyl, gear)], carb+cyl~gear, value.var = "N")
# carb cyl 3 4 5
#1: 1 4 1 4 NA
#2: 1 6 2 NA NA
#3: 2 4 NA 4 2
#4: 2 8 4 NA NA
#5: 3 8 3 NA NA
#6: 4 6 NA 4 NA
#7: 4 8 5 NA 1
#8: 6 6 NA NA 1
#9: 8 8 NA NA 1
您可以在 dcast
中使用 fill
参数将 NA
替换为0或任何其他数字。
You may use fill
argument in dcast
to replace NA
s with 0 or any other number.
这篇关于通过多种因素对表格进行分组并将其从长格式扩展到宽格式-R中的data.table方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!