调整函数,以便它而不是循环遍历所有行,而仅循环遍历组内的所有行 [英] Adjust function so that it instead of it looping through all rows, it loops only through all rows within groups
问题描述
请考虑以下玩具数据集和功能:基本上,它遍历数据集 df
的行,并根据某些条件查找匹配项.如果存在匹配项,则观察结果将通过其中一个匹配项的行号进行匹配.
consider the toy dataset and function below:
Basically, it loops through the rows of the dataset df
and looks for matches according to some criteria. If there is a match, observations are matched by a row number of one of the matches.
dataset <- data.frame(id_dom = c(20, 20, 20, 250, 250, 250,
254, 254, 254),
p201 = c(1, NA, 2, NA, NA, NA, 2, 1, 2),
V2009 = c(63, 42, 64, 26, 5, 4, 69, 30, 68)
)
match1 <- function(i, df) {
j <- 1:nrow(df)
if(!is.na(df$p201[i])){
l <- df$p201[i]
} else{
k <- abs(df$V2009[i] - df$V2009[j]) <= 1
l <- ifelse(any(k), which(k), i)
}
return(l)
}
这就是我应用该功能的方式:
This is how I would apply the function:
dataset2 <- dataset %>%
group_by(id_dom,
index = map_dbl(seq(nrow(.)),
~ .x %>% match1(df = dataset))) %>%
mutate(p201 = (first(na.omit(V2009)) - 1)*100)
如您所见,我的最终目标是将观察结果与 index
和 id_dom
配对-出于这个原因,它会更快(而且我认为也可以会产生更好的结果).如果 i
仅遍历每个 id_dom
组的行,而不是整个数据集.
As you can see, my ultimate goal is to pair observations by index
and id_dom
- For this reason, it would be much faster (and I think it would also yield slightly better results) if i
ran through only the of rows of each id_dom
group, and not the whole dataset.
我希望得到一个答案:
i)不在 match1
函数中而是在管道中将按 id_dom
分组.ii)这使我可以编写类似于 map_dbl(seq(nrow(.)),〜.x%>%match1(df =.))
的东西-这样,如果我创建了 V2009
变量之前,我不需要在运行该函数之前先断开链条.
i) Doesn't put the grouping by id_dom
in the match1
function but in the pipe.
ii) That allows me to write something looking like map_dbl(seq(nrow(.)), ~ .x %>% match1(df = . ))
- so that if I create the V2009
variable before, I don't need to break up the chain prior to running the function.
谢谢!
推荐答案
分组后,我们可以在 match
中使用 cur_data
代替 dataset
通过"id_dom"
We can use cur_data
instead of dataset
in the match
after grouping by 'id_dom'
library(dplyr)
library(purrr)
dataset %>%
# // grouped by id_dom
group_by(id_dom) %>%
# // create new group by looping over the sequence of rows
# // apply the match1
group_by(index = map_dbl(seq(n()), ~
match1(.x, df = cur_data())), .add = TRUE) %>%
# // update the p201
mutate(p201 = (first(na.omit(V2009)) - 1)*100)
或使用 group_split
dataset %>%
group_split(id_dom) %>%
map_dfr(., ~ .x %>%
group_by(index = map_dbl(row_number(),
~ match1(.x, df = cur_data()))) %>%
mutate(p201 = (first(na.omit(V2009)) - 1)*100))
这篇关于调整函数,以便它而不是循环遍历所有行,而仅循环遍历组内的所有行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!