按多个日期范围对数据进行子集化 - R [英] Subsetting data by multiple date ranges - R

查看:35
本文介绍了按多个日期范围对数据进行子集化 - R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将直截了当:我得到了一些 .csv 格式的数据集,其中包含来自机器的定期记录的传感器数据.但是,该数据集还包含机器关闭时进行的测量,我想将其与打开时记录的数据分开.为了对相关数据进行子集化,我还有一个包含这些关闭的开始和结束时间的文件.这个文件有几百行长.

I'll get straight to the point: I have been given some data sets in .csv format containing regularly logged sensor data from a machine. However, this data set also contains measurements taken when the machine is turned off, which I would like to separate from the data logged from when it is turned on. To subset the relevant data I also have a file containing start and end times of these shutdowns. This file is several hundred rows long.

此问题的相关文件示例:

Examples of the relevant files for this problem:

file: sensor_data.csv

sens_name,time,measurement
sens_A,17/12/11 06:45,32.3321
sens_A,17/12/11 08:01,36.1290
sens_B,17/12/11 05:32,17.1122
sens_B,18/12/11 03:43,12.3189

##################################################

file: shutdowns.csv

shutdown_start,shutdown_end
17/12/11 07:46,17/12/11 08:23
17/12/11 08:23,17/12/11 09:00
17/12/11 09:00,17/12/11 13:30
18/12/11 01:42,18/12/11 07:43

为了在 R 中对数据进行子集化,我之前使用了 subset() 函数,该函数具有简单的条件,效果很好,但我不知道如何对不在多个范围内的传感器数据进行子集化关闭日期范围.我已经使用 as.POSIXlt() 格式化了日期和时间数据.

To subset data in R, I have previously used the subset() function with simple conditions which has worked fine, but I don't know how to go about subsetting sensor data which fall outside multiple shutdown date ranges. I've already formatted the date and time data using as.POSIXlt().

我怀疑可能需要编写一些脚本才能提出好的解决方案,但恐怕我还没有足够的经验来处理此类数据.

I'm suspecting some scripting may be involved to come up with a good solution, but I'm afraid I am not yet experienced enough to handle this type of data.

我们将不胜感激任何帮助、建议或解决方案.让我知道解决方案是否还需要其他任何东西.

Any help, advice, or solutions will be greatly appreciated. Let me know if there's anything else needed for a solution.

推荐答案

对于数据帧内的范围,我更喜欢 POSIXct 格式.我们为关闭期间运行的传感器创建索引,t <;shutdown_start 或 t >关闭_结束.有了这些范围,我们就可以根据需要对数据进行子集:

I prefer POSIXct format for ranges within data frames. We create an index for sensors operating during shutdowns with t < shutdown_start OR t > shutdown_end. With these ranges we can then subset the data as necessary:

posixct <- function(x) as.POSIXct(x, format="%d/%m/%y %H:%M")

sensor_data$time <- posixct(sensor_data$time)
shutdowns[] <- lapply(shutdowns, posixct)

ind1 <- sapply(sensor_data$time, function(t) {
  sum(t < shutdowns[,1] | t > shutdowns[,2]) == length(sensor_data$time)})

#Measurements taken when shutdown
sensor_data[ind1,]
#   sens_name                time measurement
# 1    sens_A 2011-12-17 06:45:00     32.3321
# 3    sens_B 2011-12-17 05:32:00     17.1122

#Measurements taken when not shutdown
sensor_data[!ind1,]
#   sens_name                time measurement
# 2    sens_A 2011-12-17 08:01:00     36.1290
# 4    sens_B 2011-12-18 03:43:00     12.3189

这篇关于按多个日期范围对数据进行子集化 - R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆