更快地等于group_by％&％;％在R中扩展 [英] Faster equivalent to group_by %>% expand in R

查看：64 发布时间：2020/10/15 20:13:59 r performance dplyr data.table data-manipulation

本文介绍了更快地等于group_by％&％;％在R中扩展的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试为R中的多个ID创建年份序列。我的输入表的每个ID都有一行，并提供了一个Start_year。看起来像这样：

I am trying to create a sequence of years for multiple IDs in R. My input table has a single row for each ID, and gives a Start_year. It looks like this:

ID    Start_year
01          1999
02          2004
03          2015
04          2007

等...

我需要为每个ID创建一个包含多行的表，以显示从其Start_year到2015年的每一年。然后，我将使用它连接到另一个表。因此，在我的示例中，ID1在1999：2015年将有17行。 ID2将有12行2004：2015，ID3将有1行2015，ID4将有9行2007：2015。

I need to create a table with multiple rows for each ID, showing each year from their Start_year up to 2015. I will then use this to join to another table. So in my example, ID1 would have 17 rows with the years 1999:2015. ID2 would have 12 rows 2004:2015, ID3 would have 1 row 2015, and ID4 would have 9 rows 2007:2015.

对于我的部分数据，我可以获得这可以使用以下代码来工作：

For a subset of my data I can get this to work using the following code:

df %>% group_by(ID) %>% expand(year = Start_year:2015, Start_year) %>% select(-Start_year)

但是，我的完整数据集大约有500万个ID，此命令似乎非常慢，需要花费很多时间。

However, my full dataset has about 5 million IDs, and this command seems to be extremely slow, taking many hours.

因此，我正在R中寻找该命令的更快实现。，data.table命令通常似乎比dplyr / tidyr快-但是，我对data.table语法不甚了解。

I'm therefore looking for a faster implementation of this command in R. In my experience, data.table commands often seem to be faster than dplyr/tidyr - however, I am quite unfamiliar with data.table syntax.

推荐答案

您可以

out <- DT[, .(col = seq.int(Start_year, 2015L)), by = ID]
out
#    ID  col
# 1:  1 1999
# 2:  1 2000
# 3:  1 2001
# 4:  1 2002
# 5:  1 2003
# 6:  1 2004
# 7:  1 2005
# 8:  1 2006
# 9:  1 2007
# ...

您可能需要这样做

setDT(df)[, .(col = seq.int(Start_year, 2015L)), by = ID]

A tidyverse 相同想法的方式

library(readr); library(dplyr); library(tidyr)
tbl <- read_table(text)

tbl %>% 
  group_by(ID) %>% 
  mutate(Start_year = list(seq.int(Start_year, 2015L))) %>%
  # rename(new_col = Start_year)
  unnest()

数据

text <- "ID    Start_year
01          1999
02          2004
03          2015
04          2007"

library(data.table)
DT <- fread(text)

这篇关于更快地等于group_by％&％;％在R中扩展的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

更快地等于group_by％&％;％在R中扩展 [英] Faster equivalent to group_by %>% expand in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

更快地等于group_by％&％;％在R中扩展 [英] Faster equivalent to group_by %&gt;% expand in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

更快地等于group_by％&％;％在R中扩展 [英] Faster equivalent to group_by %>% expand in R

登录关闭