使用 if else 语句基于四列对数据进行子集化 [英] subsetting data based on four columns using if else statements

查看:23
本文介绍了使用 if else 语句基于四列对数据进行子集化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 的新手,在根据某些标准对我的数据进行子集化时遇到了一些问题.我有五列(位置、地点、物种、日期、时间).

I am new to R and having some issues with subsetting my data based on certain criteria. I have five columns (Location, Site, Species, Date, Time).

我想要做的是根据最后一条记录的时间是否比同一日期、同一物种、同一地点和同一位置的上一条记录的时间长一小时以上,对我的数据进行子集化.如果前一列与前一列不同,我希望它忽略它并继续.

What I want to do is subset my data based on if the time of the last record is greater than one hour from the previous record of the same date, same species, same site, and same location. If any of the previous columns are not the same as the previous I want it to ignore it and move on.

举个例子,这个数据框应该说前两个记录是独立的,因为时间列之前的所有列都是相同的,它们是 = 或 >相隔超过 60 分钟,而最后一个记录是 59 分钟,因此不是独立的.

As an example, this dataframe should say that the first two records are independent as all columns previous to the time column are the same and they are = or > than 60mins apart, whereas the last record is 59mins so is not independent.

  Location       Site     Species    Date     Time
  Twelve         2        frog       14-10    13:45
  Twelve         2        frog       14-10    14:45
  Twelve         2        frog       14-10    15:44 

在这个例子中,前两行是独立的,但第三行是不同的物种,后面的行与第三行相同但位置不同.第五行,所有数据与上一行相同,只是再次相隔59分钟.

In this example, the first two lines are independent but the third line is a different species, and the line after is the same as the third but is a different location. The fifth line, all the data is the same as the previous line except it again is 59mins apart.

由此产生的结果应该返回前两行(都相互独立),第三行,因为它独立于前后行(因为物种不同从前一行,不同位置到后一行),然后只是第四行独立于前一行,最后一行不应显示在数据子集上不独立于之前的线路(相隔 60 分钟).

The results from this should return the first two lines (Both independent of each other), the third line as it is independent of the line before and line after (As the species is different from the line before, and different location to the line after), and then just the fourth line as it is independent of the line before, the last line should not show on the subset of data as it is not independent of the line before (<60mins apart).

  Location       Site     Species    Date     Time
  Twelve         2        frog       14-10    13:45
  Twelve         2        frog       14-10    14:45
  Twelve         2        badger     14-10    15:44
  Thirteen       2        badger     14-10    16:44
  Thirteen       2        badger     14-10    17:42

推荐答案

*** EDITED tp include OP comments ***

*** EDITED tp include OP comments ***

filter 只过滤独立的行.select 不显示中间列

filter to only independent rows. select to not show interim colums

df %>% 
  group_by(Location, Site, Species, Date) %>% 
  mutate(difftime = as.numeric(hms::as_hms(Time) - hms::as_hms(lag(Time, 1)))/3600) %>%
  mutate(independent = case_when(
    is.na(difftime) ~ TRUE,
    difftime >= 1 ~ TRUE,
    difftime < 1  ~ FALSE,
    TRUE ~ FALSE)
  ) %>% 
  filter(independent) %>%
  select(-difftime, -independent)
# A tibble: 4 x 5
# Groups:   Location, Site, Species, Date [3]
  Location  Site Species Date  Time  
  <chr>    <dbl> <chr>   <chr> <time>
1 Twelve       2 frog    14-10 13:45 
2 Twelve       2 frog    14-10 14:45 
3 Twelve       2 badger  14-10 15:44 
4 Thirteen     2 badger  14-10 16:44 

library(dplyr)
df %>% 
  group_by(Location, Site, Species, Date) %>% 
  mutate(difftime = as.numeric(Time - lag(Time, 1))/3600) %>%
  mutate(independent = case_when(
        is.na(difftime) ~ TRUE,
        difftime >= 1 ~ TRUE,
        difftime < 1  ~ FALSE,
        TRUE ~ FALSE)
   )
#> # A tibble: 6 x 7
#> # Groups:   Location, Site, Species, Date [3]
#>   Location  Site Species Date  Time   difftime independent
#>   <chr>    <dbl> <chr>   <chr> <time>    <dbl> <lgl>      
#> 1 Twelve       2 frog    14-10 13:45    NA     TRUE       
#> 2 Twelve       2 frog    14-10 14:45     1     TRUE       
#> 3 Twelve       2 frog    14-10 15:44     0.983 FALSE      
#> 4 Twelve       2 badger  14-10 15:44    NA     TRUE       
#> 5 Thirteen     2 badger  14-10 16:44    NA     TRUE       
#> 6 Thirteen     2 badger  14-10 17:42     0.967 FALSE

添加 hms::as_hms 使警告信息消失

adding hms::as_hms makes the warning message go away

df %>% 
  group_by(Location, Site, Species, Date) %>% 
  mutate(difftime = as.numeric(hms::as_hms(Time) - hms::as_hms(lag(Time, 1)))/3600) %>%
      mutate(independent = case_when(
        is.na(difftime) ~ TRUE,
        difftime >= 1 ~ TRUE,
        difftime < 1  ~ FALSE,
        TRUE ~ FALSE)
      )
#> # A tibble: 6 x 7
#> # Groups:   Location, Site, Species, Date [3]
#>   Location  Site Species Date  Time   difftime independent
#>   <chr>    <dbl> <chr>   <chr> <time>    <dbl> <lgl>      
#> 1 Twelve       2 frog    14-10 13:45    NA     TRUE       
#> 2 Twelve       2 frog    14-10 14:45     1     TRUE       
#> 3 Twelve       2 frog    14-10 15:44     0.983 FALSE      
#> 4 Twelve       2 badger  14-10 15:44    NA     TRUE       
#> 5 Thirteen     2 badger  14-10 16:44    NA     TRUE       
#> 6 Thirteen     2 badger  14-10 17:42     0.967 FALSE

您的数据

df <- readr::read_table2("Location       Site     Species    Date     Time
Twelve         2        frog       14-10    13:45
Twelve         2        frog       14-10    14:45
Twelve         2        frog       14-10    15:44
Twelve         2        badger     14-10    15:44
Thirteen       2        badger     14-10    16:44
Thirteen       2        badger     14-10    17:42")

这篇关于使用 if else 语句基于四列对数据进行子集化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆