调整函数,以便它而不是循环遍历所有行,而仅循环遍历组内的所有行 [英] Adjust function so that it instead of it looping through all rows, it loops only through all rows within groups

查看:69
本文介绍了调整函数,以便它而不是循环遍历所有行,而仅循环遍历组内的所有行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下玩具数据集和功能:基本上,它遍历数据集 df 的行,并根据某些条件查找匹配项.如果存在匹配项,则观察结果将通过其中一个匹配项的行号进行匹配.

consider the toy dataset and function below: Basically, it loops through the rows of the dataset df and looks for matches according to some criteria. If there is a match, observations are matched by a row number of one of the matches.

 dataset <- data.frame(id_dom = c(20, 20, 20, 250, 250, 250, 
                                  254, 254, 254),        
                       p201 = c(1, NA, 2, NA, NA, NA, 2, 1, 2), 
                       V2009 = c(63, 42, 64, 26, 5, 4, 69, 30, 68)
                       )
match1 <- function(i, df) {
  j <- 1:nrow(df)
  
  if(!is.na(df$p201[i])){
    l <- df$p201[i]
  } else{
    
    k <-  abs(df$V2009[i] - df$V2009[j]) <= 1
    l <- ifelse(any(k), which(k), i)
  }
  
  return(l)
}

这就是我应用该功能的方式:

This is how I would apply the function:

dataset2 <- dataset %>%
  group_by(id_dom,
           index = map_dbl(seq(nrow(.)), 
                            ~ .x %>% match1(df = dataset))) %>%
  mutate(p201 = (first(na.omit(V2009)) - 1)*100)

如您所见,我的最终目标是将观察结果与 index id_dom 配对-出于这个原因,它会更快(而且我认为也可以会产生更好的结果).如果 i 仅遍历每个 id_dom 组的行,而不是整个数据集.

As you can see, my ultimate goal is to pair observations by index and id_dom - For this reason, it would be much faster (and I think it would also yield slightly better results) if i ran through only the of rows of each id_dom group, and not the whole dataset.

我希望得到一个答案:

i)不在 match1 函数中而是在管道中将按 id_dom 分组.ii)这使我可以编写类似于 map_dbl(seq(nrow(.)),〜.x%>%match1(df =.))的东西-这样,如果我创建了 V2009 变量之前,我不需要在运行该函数之前先断开链条.

i) Doesn't put the grouping by id_dom in the match1 function but in the pipe. ii) That allows me to write something looking like map_dbl(seq(nrow(.)), ~ .x %>% match1(df = . )) - so that if I create the V2009 variable before, I don't need to break up the chain prior to running the function.

谢谢!

推荐答案

分组后,我们可以在 match 中使用 cur_data 代替 dataset 通过"id_dom"

We can use cur_data instead of dataset in the match after grouping by 'id_dom'

library(dplyr)
library(purrr)
dataset %>%
     # // grouped by id_dom
     group_by(id_dom) %>%
     # // create new group by looping over the sequence of rows
     # // apply the match1
     group_by(index = map_dbl(seq(n()), ~ 
         match1(.x, df = cur_data())), .add = TRUE) %>%
     # // update the p201
     mutate(p201 = (first(na.omit(V2009)) - 1)*100)


或使用 group_split

dataset %>% 
   group_split(id_dom) %>%
   map_dfr(., ~ .x %>%
                group_by(index = map_dbl(row_number(),
                  ~ match1(.x, df = cur_data()))) %>%
                 mutate(p201 = (first(na.omit(V2009)) - 1)*100))

这篇关于调整函数,以便它而不是循环遍历所有行,而仅循环遍历组内的所有行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆