更快地等于group_by%&%;%在R中扩展 [英] Faster equivalent to group_by %>% expand in R
问题描述
我正在尝试为R中的多个ID创建年份序列。我的输入表的每个ID都有一行,并提供了一个Start_year。看起来像这样:
I am trying to create a sequence of years for multiple IDs in R. My input table has a single row for each ID, and gives a Start_year. It looks like this:
ID Start_year
01 1999
02 2004
03 2015
04 2007
等...
我需要为每个ID创建一个包含多行的表,以显示从其Start_year到2015年的每一年。然后,我将使用它连接到另一个表。因此,在我的示例中,ID1在1999:2015年将有17行。 ID2将有12行2004:2015,ID3将有1行2015,ID4将有9行2007:2015。
I need to create a table with multiple rows for each ID, showing each year from their Start_year up to 2015. I will then use this to join to another table. So in my example, ID1 would have 17 rows with the years 1999:2015. ID2 would have 12 rows 2004:2015, ID3 would have 1 row 2015, and ID4 would have 9 rows 2007:2015.
对于我的部分数据,我可以获得这可以使用以下代码来工作:
For a subset of my data I can get this to work using the following code:
df %>% group_by(ID) %>% expand(year = Start_year:2015, Start_year) %>% select(-Start_year)
但是,我的完整数据集大约有500万个ID,此命令似乎非常慢,需要花费很多时间。
However, my full dataset has about 5 million IDs, and this command seems to be extremely slow, taking many hours.
因此,我正在R中寻找该命令的更快实现。 ,data.table命令通常似乎比dplyr / tidyr快-但是,我对data.table语法不甚了解。
I'm therefore looking for a faster implementation of this command in R. In my experience, data.table commands often seem to be faster than dplyr/tidyr - however, I am quite unfamiliar with data.table syntax.
推荐答案
您可以
out <- DT[, .(col = seq.int(Start_year, 2015L)), by = ID]
out
# ID col
# 1: 1 1999
# 2: 1 2000
# 3: 1 2001
# 4: 1 2002
# 5: 1 2003
# 6: 1 2004
# 7: 1 2005
# 8: 1 2006
# 9: 1 2007
# ...
您可能需要这样做
setDT(df)[, .(col = seq.int(Start_year, 2015L)), by = ID]
A tidyverse
相同想法的方式
library(readr); library(dplyr); library(tidyr)
tbl <- read_table(text)
tbl %>%
group_by(ID) %>%
mutate(Start_year = list(seq.int(Start_year, 2015L))) %>%
# rename(new_col = Start_year)
unnest()
数据
text <- "ID Start_year
01 1999
02 2004
03 2015
04 2007"
library(data.table)
DT <- fread(text)
这篇关于更快地等于group_by%&%;%在R中扩展的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!