使用 if else 语句基于四列对数据进行子集化 [英] subsetting data based on four columns using if else statements
问题描述
我是 R 的新手,在根据某些标准对我的数据进行子集化时遇到了一些问题.我有五列(位置、地点、物种、日期、时间).
I am new to R and having some issues with subsetting my data based on certain criteria. I have five columns (Location, Site, Species, Date, Time).
我想要做的是根据最后一条记录的时间是否比同一日期、同一物种、同一地点和同一位置的上一条记录的时间长一小时以上,对我的数据进行子集化.如果前一列与前一列不同,我希望它忽略它并继续.
What I want to do is subset my data based on if the time of the last record is greater than one hour from the previous record of the same date, same species, same site, and same location. If any of the previous columns are not the same as the previous I want it to ignore it and move on.
举个例子,这个数据框应该说前两个记录是独立的,因为时间列之前的所有列都是相同的,它们是 = 或 >相隔超过 60 分钟,而最后一个记录是 59 分钟,因此不是独立的.
As an example, this dataframe should say that the first two records are independent as all columns previous to the time column are the same and they are = or > than 60mins apart, whereas the last record is 59mins so is not independent.
Location Site Species Date Time
Twelve 2 frog 14-10 13:45
Twelve 2 frog 14-10 14:45
Twelve 2 frog 14-10 15:44
在这个例子中,前两行是独立的,但第三行是不同的物种,后面的行与第三行相同但位置不同.第五行,所有数据与上一行相同,只是再次相隔59分钟.
In this example, the first two lines are independent but the third line is a different species, and the line after is the same as the third but is a different location. The fifth line, all the data is the same as the previous line except it again is 59mins apart.
由此产生的结果应该返回前两行(都相互独立),第三行,因为它独立于前后行(因为物种不同从前一行,不同位置到后一行),然后只是第四行独立于前一行,最后一行不应显示在数据子集上不独立于之前的线路(相隔 60 分钟).
The results from this should return the first two lines (Both independent of each other), the third line as it is independent of the line before and line after (As the species is different from the line before, and different location to the line after), and then just the fourth line as it is independent of the line before, the last line should not show on the subset of data as it is not independent of the line before (<60mins apart).
Location Site Species Date Time
Twelve 2 frog 14-10 13:45
Twelve 2 frog 14-10 14:45
Twelve 2 badger 14-10 15:44
Thirteen 2 badger 14-10 16:44
Thirteen 2 badger 14-10 17:42
推荐答案
*** EDITED tp include OP comments ***
*** EDITED tp include OP comments ***
filter
只过滤独立的行.select
不显示中间列
filter
to only independent rows. select
to not show interim colums
df %>%
group_by(Location, Site, Species, Date) %>%
mutate(difftime = as.numeric(hms::as_hms(Time) - hms::as_hms(lag(Time, 1)))/3600) %>%
mutate(independent = case_when(
is.na(difftime) ~ TRUE,
difftime >= 1 ~ TRUE,
difftime < 1 ~ FALSE,
TRUE ~ FALSE)
) %>%
filter(independent) %>%
select(-difftime, -independent)
# A tibble: 4 x 5
# Groups: Location, Site, Species, Date [3]
Location Site Species Date Time
<chr> <dbl> <chr> <chr> <time>
1 Twelve 2 frog 14-10 13:45
2 Twelve 2 frog 14-10 14:45
3 Twelve 2 badger 14-10 15:44
4 Thirteen 2 badger 14-10 16:44
library(dplyr)
df %>%
group_by(Location, Site, Species, Date) %>%
mutate(difftime = as.numeric(Time - lag(Time, 1))/3600) %>%
mutate(independent = case_when(
is.na(difftime) ~ TRUE,
difftime >= 1 ~ TRUE,
difftime < 1 ~ FALSE,
TRUE ~ FALSE)
)
#> # A tibble: 6 x 7
#> # Groups: Location, Site, Species, Date [3]
#> Location Site Species Date Time difftime independent
#> <chr> <dbl> <chr> <chr> <time> <dbl> <lgl>
#> 1 Twelve 2 frog 14-10 13:45 NA TRUE
#> 2 Twelve 2 frog 14-10 14:45 1 TRUE
#> 3 Twelve 2 frog 14-10 15:44 0.983 FALSE
#> 4 Twelve 2 badger 14-10 15:44 NA TRUE
#> 5 Thirteen 2 badger 14-10 16:44 NA TRUE
#> 6 Thirteen 2 badger 14-10 17:42 0.967 FALSE
添加 hms::as_hms
使警告信息消失
adding hms::as_hms
makes the warning message go away
df %>%
group_by(Location, Site, Species, Date) %>%
mutate(difftime = as.numeric(hms::as_hms(Time) - hms::as_hms(lag(Time, 1)))/3600) %>%
mutate(independent = case_when(
is.na(difftime) ~ TRUE,
difftime >= 1 ~ TRUE,
difftime < 1 ~ FALSE,
TRUE ~ FALSE)
)
#> # A tibble: 6 x 7
#> # Groups: Location, Site, Species, Date [3]
#> Location Site Species Date Time difftime independent
#> <chr> <dbl> <chr> <chr> <time> <dbl> <lgl>
#> 1 Twelve 2 frog 14-10 13:45 NA TRUE
#> 2 Twelve 2 frog 14-10 14:45 1 TRUE
#> 3 Twelve 2 frog 14-10 15:44 0.983 FALSE
#> 4 Twelve 2 badger 14-10 15:44 NA TRUE
#> 5 Thirteen 2 badger 14-10 16:44 NA TRUE
#> 6 Thirteen 2 badger 14-10 17:42 0.967 FALSE
您的数据
df <- readr::read_table2("Location Site Species Date Time
Twelve 2 frog 14-10 13:45
Twelve 2 frog 14-10 14:45
Twelve 2 frog 14-10 15:44
Twelve 2 badger 14-10 15:44
Thirteen 2 badger 14-10 16:44
Thirteen 2 badger 14-10 17:42")
这篇关于使用 if else 语句基于四列对数据进行子集化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!