使用dplyr选择过滤的行之前的行 [英] Select rows before a filtered row using dplyr

查看:123
本文介绍了使用dplyr选择过滤的行之前的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在进行一项研究,我们使用照相机放置在巢箱中,以确定我们的研究物种何时产下第一枚卵。有些相机不是很可靠,我想看看在下第一枚卵之前是否有连续的照片。这样我就不能确定这是第一个卵子约会。有超过165,000张照片和200个以上的嵌套,因此我按嵌套框ID分组,将行过滤到至少有1个鸡蛋的行,然后使用slice函数选择包含数据的第一行。这是一个可复制的示例:

I'm working on a study where we used a camera placed inside a nest box to determine when our study species laid its first egg. Some of the cameras weren't super reliable, and I'd like to see if there were continuous photos before the date where the first egg was laid. This way I can no for sure that this is the first egg date. There are >165,000 photos and >200 nests, so I grouped by nest box ID, filtered the rows down to those that have at least 1 egg, and then used the slice function to select the first row with data. Here's a reproducible example:

example <- structure(list(boxID = c("CA10", "CA10", "CA10", "CA10", "CA10", 
    "CA10", "CA10", "CA10", "CA10", "CA10", "CA10", "CA10", "CA10", 
    "CA10", "CA10"), visitType = c("Image", "Image", "Image", "Image", 
    "Image", "Image", "Image", "Image", "Image", "Image", "Image", 
    "Image", "Image", "Image", "Image"), day = c(25L, 25L, 25L, 26L, 
    26L, 26L, 27L, 27L, 27L, 28L, 28L, 28L, 29L, 29L, 29L), month = c("MAR", 
    "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", "MAR", 
    "MAR", "MAR", "MAR", "MAR", "MAR"), year = c(2018, 2018, 2018, 
    2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
    2018), timeChecked = c("02:59", "09:06", "15:13", "02:59", "09:07", 
    "15:14", "02:59", "09:07", "15:13", "02:58", "09:06", "15:12", 
    "02:58", "09:06", "15:12"), species = c("Empty", "Empty", "Empty", 
    "Empty", "Empty", "Empty", "Empty", "Empty", "American Kestrel", 
    "Empty", "American Kestrel", "American Kestrel", "American Kestrel", 
    "American Kestrel", "American Kestrel"), sexAdult = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, "Female", NA, "Female", "Female", "Female", 
    NA, NA), numEggs = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    "1", "1", "1", "1", "1"), numNestlings = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), date = structure(c(17615, 
    17615, 17615, 17616, 17616, 17616, 17617, 17617, 17617, 17618, 
    17618, 17618, 17619, 17619, 17619), class = "Date")), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -15L), .Names = c("boxID", 
    "visitType", "day", "month", "year", "timeChecked", "species", 
    "sexAdult", "numEggs", "numNestlings", "date"))

这是我的代码来查找至少有一个鸡蛋的第一行:

Here's the code I have to find the first row that has at least 1 egg:

example %>%
      mutate_at(vars(numEggs, numNestlings), na_if, 'unknown') %>% # remove unknowns and other values that should be NA
      select(boxID, date, numEggs, visitType) %>%
      group_by(boxID) %>%
      filter(numEggs > 0) %>%
      slice(1) 

我想看看在第一行之前放置5或10行,并放置一个鸡蛋,以确保在该时间点之前有连续数据。有没有办法使用slice或其他dplyr函数对行进行索引?

I'd like to look at the 5 or 10 rows before this first row with an egg to make sure there was continuous data up to this point in time. Is there a way to do this row indexing with slice or some other dplyr function?

推荐答案

这是一种方法。 match 返回第一个 numEggs>的位置。 0 ,然后我们简单地从该位置获得另外的 n_previous 行。我们使用 max(1,...),以便如果第一个 numEggs>的位置不出错。 0 < n_previous

Here's one way. match returns the position of first numEggs > 0 and then we simply get additional n_previous rows from that position. We use max(1, ...) so that we don't get error if position of first numEggs > 0 < n_previous .

n_previous <- 5

example %>%
  mutate_at(vars(numEggs, numNestlings), na_if, 'unknown') %>% 
  select(boxID, date, numEggs, visitType) %>%
  group_by(boxID) %>%
  slice(max(1, match(TRUE, numEggs > 0) - n_previous):match(TRUE, numEggs > 0))

# A tibble: 6 x 4
# Groups:   boxID [1]
  boxID date       numEggs visitType
  <chr> <date>     <chr>   <chr>    
1 CA10  2018-03-26 <NA>    Image    
2 CA10  2018-03-27 <NA>    Image    
3 CA10  2018-03-27 <NA>    Image    
4 CA10  2018-03-27 <NA>    Image    
5 CA10  2018-03-28 <NA>    Image    
6 CA10  2018-03-28 1       Image  

这篇关于使用dplyr选择过滤的行之前的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆