使用dplyr组合并使用迭代过滤器进行汇总 [英] Group and summarize with iterative filter using dplyr

查看:210
本文介绍了使用dplyr组合并使用迭代过滤器进行汇总的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果已经提出了这个问题,我们一直在搜索,没有找到可以应用于我的问题的答案。

Upfront apology if this has been asked, I have been searching all day and have not found an answer I can apply to my problem.

我试图使用dplyr(和co。)解决这个问题,因为我以前的方法(for循环)太低效了。我有一个事件时间的数据集,在站点,分组。我想总结沿序列移动窗口中发生的事件的数量(和比例)。

I am trying to solve this issue using dplyr (and co.) because my previous method (for loops) was too inefficient. I have a dataset of event times, at sites, that are in groups. I want to summarize the number (and proportion) of events that occur in a moving window along a sequence.

# Example data
set.seed(1)
sites = rep(letters[1:10],10)
groups = c('red','blue','green','yellow')
times = round(runif(length(sites),1,100))

timePeriod = seq(1,100)

# Example dataframe
df = data.frame(site = sites,
                group = rep(groups,length(sites)/length(groups)),
                time = times)

这是我尝试总结在给定的移动时间窗口中包含时间(事件)的每个组中的站点数。
目标是移动向量 timePeriod 的每个元素,并总结每个组中的事件发生在 timePeriod [i] + / - 半窗口。最终将它们存储在例如每个组中的列的数据框中,并且每个时间步的行都是理想的。

This is my attempt to summarize the number of sites from each group that contain a time (event) within a given moving window of time. The goal is to move through each element of the vector timePeriod and summarize how many events in each group occurred at timePeriod[i] +/- half-window. Ultimately storing them in, e.g., a dataframe with a column for each group, and a row for each time step, is ideal.

df %>%
filter(time > timePeriod[i]-25 & time < timePeriod[i]+25) %>%
group_by(group) %>%
summarise(count = n())

如何循环遍历我的序列的时间并分别存储每个组的汇总表?谢谢!

How can I do this without looping through my sequence of time and storing the summary table for each group individually? Thanks!

推荐答案

结合 lapply dplyr ,您可以执行以下操作,这与您迄今为止所做的工作相近。

Combining lapply and dplyr, you can do the following, which is close to what you had worked so far.

lapply(timePeriod, function(i){
  df %>%
    filter(time > (i - 25) & time < ( i + 25 ) )  %>%
    group_by(group) %>%
    summarise(count = n()) %>% 
    mutate(step = i)
}) %>% 
  bind_rows()

这篇关于使用dplyr组合并使用迭代过滤器进行汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆