创建列以标记R中日期时段内的行 [英] Create column to flag rows within a date period in R

查看:109
本文介绍了创建列以标记R中日期时段内的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在主数据框中创建一个flag列,用于标记日期在特定时间范围内的行。该时间范围来自第二个数据帧。我想我只是坚持ifelse(或if)语句,因为flag列中有NA。也许ifelse不是要走的路。以下是一些示例数据:

I need to create a "flag" column within my main data frame that flags rows where the date is within a specific time range. That time range comes from a second data frame. I think I'm just stuck on the ifelse (or if) statement because there are NA's in the flag column. Maybe ifelse isn't the way to go. Here's some sample data:

    # main data frame
date <- seq(as.Date("2014-07-21"), as.Date("2014-09-11"), by = "day") 
group <- letters[1:4]                           
datereps <- rep(date, length(group))                  
groupreps <- rep(group, each = length(date))    
value  <- rnorm(length(datereps))
df <- data.frame(Date = datereps, Group = groupreps, Value = value)  

# flag time period data frame
flag <- data.frame(Group = c("b", "d"), 
        start = c("2014-08-01", "2014-08-26"),
        end = c("2014-08-11", "2014-09-01"))

# Merge flag dates into main data frame
df2 <- merge(df, flag, by = "Group", all.x = T)

# Execute ifelse statement on each row
df2$flag <- "something"
df2$flag <- ifelse(df2$Date >= as.Date(df2$start) & df2$Date <= as.Date(df2$end), "flag", "other")

结果是在指定开始和结束日期的行中,flag和other被标记,但是start和end是NA,我得到的d值。 $标志。即使我使用something启动 df2 $ flag ,也会发生这种情况。对于未定义为flag的所有值,我想要other。查看行50:68。

The result is that in rows where a "start" and "end" date are specified, "flag" and "other" are labeled, but where "start" and "end" are NA, I get Na values for df2$flag. This happens even when I initiate df2$flag with "something". I want "other" for all values that are not defined as "flag". Look at rows 50:68.

df2[50:68,]


推荐答案

将您的最后一行更改为:

Change your last line to:

for (i in 1:nrow(df2)) {
    if (is.na(df2$start[i])) {
        df2$flag[i] = 'other'
    } else if (df2$Date[i] >= as.Date(df2$start[i]) & df2$Date[i] <= as.Date(df2$end[i])) {
        df2$flag[i] = "flag"
    } else {
        df2$flag[i] = "other"
    }
}

它的丑陋但它能完成任务。此代码不是矢量化的,因此它适合您的情况,但对于较大的应用程序来说会很慢。

Its ugly but it does the job. This code is not vectorized, so its fine for your situation, but would be slow for larger applications.

这篇关于创建列以标记R中日期时段内的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆