R：基于多个变量的多个值的子集数据帧 [英] R: Subset data frame based on multiple values for multiple variables

查看：110 发布时间：2017/4/8 18:08:09 r date subset

本文介绍了R：基于多个变量的多个值的子集数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要根据特定日期，ID号，事件开始时间和事件结束的组合，从第一个数据集（称为 df1 ）中提取记录时间与第二个数据集匹配（ df2 ）。当只有1个日期，ID和事件开始和结束时间时，一切正常，但数据集之间的一些匹配记录包含多个ID，日期或时间，并且我无法从<$ c在这些情况下，$ c> df1 可以正确地进行子集。我最终想把它放在一个FOR循环或独立的函数中，因为我有一个相当大的数据集。这是我到目前为止：

I need to pull records from a first data set (called df1 here) based on a combination of specific dates, ID#s, event start time, and event end time that match with a second data set (df2). Everything works fine when there is just 1 date, ID, and event start and end time, but some of the matching records between the data sets contain multiple IDs, dates, or times, and I can't get the records from df1 to subset properly in those cases. I ultimately want to put this in a FOR loop or independent function since I have a rather large dataset. Here's what I've got so far:

我刚刚通过匹配两个数据集之间的日期，如下所示：

I started just by matching the dates between the two data sets as follows:

match_dates <- as.character(intersect(df1$Date, df2$Date))

然后我根据第一个匹配日期选择 df2 中的记录，同时保留其他列，以便我有其他ID时间信息我需要：

Then I selected the records in df2 based on the first matching date, also keeping the other columns so I have the other ID and time information I need:

records <- df2[which(df2$Date == match_dates[1]), ]

从记录的日期，ID，开始和结束时间然后：

[1] "01-04-2009" "599091"     "12:00"      "17:21"

最后我子集 df1 for根据记录中的日期，ID和时间，并将它们组合成一个新的数据框架，名为 final 获取我最终需要的 df1 中包含的数据。

Finally I subset df1 for before and after the event based on the date, ID, and times in records and combined them into a new data frame called final to get at the data contained in df1 that I ultimately need.

before <- subset(df1, NUM==records$ID & Date==records$Date & Time<records$Start)
after <- subset(df1, NUM==records$ID & Date==records$Date & Time>records$End)
final <- rbind(before, after)

这是真正的问题 - 一些匹配的日期在 df2 ，并返回多个ID或次数。以下是多个记录的例子：

Here's the real problem - some of the matching dates have more than 1 corresponding row in df2, and return multiple IDs or times. Here is what an example of multiple records looks like:

records <- df2[which(df2$Date == match_dates[25]), ]

> records$ID
[1] 507646 680845 680845
> records$Date
[1] "04-02-2009" "04-02-2009" "04-02-2009"
> records$Start
[1] "09:43" "05:37" "11:59"
> records$End
[1] "05:19" "11:29" "16:47"

当我尝试基于这个子集 df1 时，我会收到一个错误：

When I try to subset df1 based on this I get an error:

before <- subset(df1, NUM==records$ID & Date==records$Date & Time<records$Start)
Warning messages:
1: In NUM == records$ID :
  longer object length is not a multiple of shorter object length
2: In Date == records$Date :
  longer object length is not a multiple of shorter object length
3: In Time < records$Start :
  longer object length is not a multiple of shorter object length

尝试手动执行每个ID日期时间组合将是乏味的方式。我有9年的数据，所有的数据集之间的给定年份的多个匹配日期，所以理想情况下，我想将其设置为FOR循环，或一个FOR循环的函数，但我可以'不要超过这个。提前感谢任何提示！

Trying to do it manually for each ID-date-time combination would be way to tedious. I have 9 years worth of data, all with multiple matching dates for a given year between the data sets, so ideally I would like to set this up as a FOR loop, or a function with a FOR loop in it, but I can't get past this. Thanks in advance for any tips!

R：基于多个变量的多个值的子集数据帧 [英] R: Subset data frame based on multiple values for multiple variables

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：基于多个变量的多个值的子集数据帧 [英] R: Subset data frame based on multiple values for multiple variables

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭