在条件组中删除带有NA的ID [英] Drop ID with NA in a conditional group
问题描述
扩展此问题:
我使用以下代码准备了一些数据:
I have some data prepared using the below code:
# # Data Preparation ----------------------
library(lubridate)
start_date <- "2018-10-30 00:00:00"
start_date <- as.POSIXct(start_date, origin="1970-01-01")
dates <- c(start_date)
for(i in 1:287) {
dates <- c(dates, start_date + minutes(i * 10))
}
dates <- as.POSIXct(dates, origin="1970-01-01")
date_val <- format(dates, '%d-%m-%Y')
weather.forecast.data <- data.frame(dateTime = dates, date = date_val)
weather.forecast.data <- rbind(weather.forecast.data, weather.forecast.data, weather.forecast.data, weather.forecast.data)
weather.forecast.data$id <- c(rep('GH1', 288), rep('GH2', 288), rep('GH3', 288), rep('GH4', 288))
weather.forecast.data$radiation <- round(runif(nrow(weather.forecast.data)), 2)
weather.forecast.data$hour <- as.integer(format(weather.forecast.data$dateTime, '%H'))
weather.forecast.data$day_night <- ifelse(weather.forecast.data$hour < 6, 'night', ifelse(weather.forecast.data$hour < 19, 'day', 'night'))
# # GH2: Total Morning missing # #
weather.forecast.data$radiation[(weather.forecast.data$id == 'GH2') & (weather.forecast.data$date == '30-10-2018') & (weather.forecast.data$day_night == 'day')] = NA
weather.forecast.data$hour <- NULL
weather.forecast.data$day_night <- NULL
我的任务是从weather.forecast.data中删除ID,其中每个ID和每个日期的早晨一半( 06小时到18小时),使用 R
中的 dplyr
来缺少辐射值(NA)。
My task is to remove ids from the weather.forecast.data where for each id and each date, morning half (06 hours to 18 hours), the radiation values are missing (NA) using dplyr
in R
.
我要消除给定 id
和日期$ c $的行c>缺少整个早晨的
辐射
值。也就是说,如果日期
的ID缺少早晨辐射
。我删除所有具有特定 id
和 date
的行。因此,我们删除了所有144条记录,因为它的早晨缺少辐射。
I want to eliminate rows for a given id
and date
which has the entire morning radiation
value as missing. i.e. if an id for a date
has morning radiation
missing. I drop all the rows with that particular id
and date
. So, we drop all the 144 records because its morning has radiation missing.
我们可以看到 GH2
具有完整的 30-10-2018
缺少早晨辐射。因此,我们删除所有144条记录,其中 id =='GH2'
和 date = '30 -10-2018'
。
We can see that GH2
has entire morning radiation missing on date 30-10-2018
. We therefore drop all 144 records with id == 'GH2'
and date = '30-10-2018'
.
setDT(weather.forecast.data)
weather.forecast.data[, sum(is.na(radiation)), .(id, date)]
id date V1
1: GH1 30-10-2018 0
2: GH1 31-10-2018 0
3: GH2 30-10-2018 78
4: GH2 31-10-2018 0
5: GH3 30-10-2018 0
6: GH3 31-10-2018 0
7: GH4 30-10-2018 0
8: GH4 31-10-2018 0
我有使用 data.table
:
setDT(weather.forecast.data)
weather.forecast.data[, hour:= hour(dateTime)]
weather.forecast.data[, day_night:=c("night", "day")[(6 <= hour & hour < 19) + 1L]]
weather.forecast.data[, date_id := paste(date, id, sep = "__")]
weather.forecast.data[, all_is_na := all(is.na(radiation)), .(date_id, day_night)]
weather.forecast.data[!(date_id %in% unique(weather.forecast.data[(all_is_na == TRUE) & (day_night == 'day'), date_id]))]
我需要使用<$ c的代码$ c> dplyr 我已经尝试了以下方法。它丢弃的行比所需的多:
I need the code using dplyr
and I have tried the following. It is dropping many rows than required:
library(dplyr)
weather.forecast.data <- weather.forecast.data %>%
mutate(hour = as.integer(format(dateTime, '%H'))) %>%
mutate(day_night = ifelse(hour < 6, 'night', ifelse(hour < 19, 'day', 'night'))) %>%
group_by(date, day_night, id) %>%
filter((!all(is.na(radiation))) & (day_night == 'day')) %>%
select (-c(hour, day_night)) %>%
as.data.frame
注意:输出应通过删除 id ='GH2'
和 date = '30 -10-2018'
Note: Output should return the data by dropping the rows where id = 'GH2'
and date = '30-10-2018'
推荐答案
我相信您有点复杂。下面的代码按照您在问题中的描述进行操作。
I believe you are complicating a bit. The following code does what you describe in the question.
library(lubridate)
library(dplyr)
weather.forecast.data %>%
mutate(hour = hour(dateTime),
day_night = c("night", "day")[(6 <= hour & hour < 19) + 1L]) %>%
group_by(date, id) %>%
mutate(delete = all(!(is.na(radiation) & day_night == "day"))) %>%
ungroup() %>%
filter(delete) %>%
select(-hour, -day_night, -delete) %>%
as.data.frame() -> df1
查看是否可以提供预期的144条删除行。
See if it worked giving the expected 144 deleted rows.
nrow(weather.forecast.data) - nrow(df1)
#[1] 144
数据。
我重新发布了数据生成代码,简化为两部分位置并调用 set.seed
。
I repost the data generation code, simplified in two places and with a call to set.seed
.
set.seed(4192)
start_date <- "2018-10-30 00:00:00"
start_date <- as.POSIXct(start_date, origin="1970-01-01")
dates <- start_date + minutes(0:287 * 10)
dates <- as.POSIXct(dates, origin="1970-01-01")
date_val <- format(dates, '%d-%m-%Y')
weather.forecast.data <- data.frame(dateTime = dates, date = date_val)
weather.forecast.data <- rbind(weather.forecast.data, weather.forecast.data, weather.forecast.data, weather.forecast.data)
weather.forecast.data$id <- c(rep('GH1', 288), rep('GH2', 288), rep('GH3', 288), rep('GH4', 288))
weather.forecast.data$radiation <- round(runif(nrow(weather.forecast.data)), 2)
weather.forecast.data$hour <- hour(weather.forecast.data$dateTime)
weather.forecast.data$day_night <- ifelse(weather.forecast.data$hour < 6, 'night', ifelse(weather.forecast.data$hour < 19, 'day', 'night'))
# # GH2: Total Morning missing # #
weather.forecast.data$radiation[(weather.forecast.data$id == 'GH2') & (weather.forecast.data$date == '30-10-2018') & (weather.forecast.data$day_night == 'day')] = NA
weather.forecast.data$hour <- NULL
weather.forecast.data$day_night <- NULL
这篇关于在条件组中删除带有NA的ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!