在R中的多列上进行过滤 [英] filter on multiple columns in R

查看:45
本文介绍了在R中的多列上进行过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在处理一些数据,我正在尝试根据特定的行过滤列,但到目前为止,我一直没有成功.有人可以帮我吗?让我解释一下我要达到的目标.我有一个显示以下信息的数据集

I have been working with some data at my work and I am trying to filter columns based on specific rows but I have been unsuccessful so far. Can anyone please help me out? Let me explain what I am trying to achieve. I have a dataset which displays the following information

    person_id|custody_start|custody_end|contact_month|month_start     |month_end |contact_date
    13126321 |02/23/2020   |07/17/2020 |February 20  |02/01/2020      |02/28/2020|02/26/2020    
    13126321 |02/23/2020   |07/17/2020 |March 20     |03/01/2020      |03/31/2020|03/12/2020    
    13126321 |02/23/2020   |07/17/2020 |April 20     |04/01/2020      |04/30/2020|04/11/2020  
    13126321 |02/23/2020   |07/17/2020 |May 20       |05/01/2020      |05/31/2020|05/12/2020 
    13126321 |02/23/2020   |07/17/2020 |June 20      |06/01/2020      |06/30/2020|06/11/2020  
    13126321 |02/23/2020   |07/17/2020 |July 20      |07/01/2020      |07/31/2020|07/12/2020

我想要过滤掉contact_date与custody_start或custody_end在同一月份的列.因此,在这种情况下,应该过滤掉第一列和最后一列,并且我们应该只具有从三月到六月的数据.

What I want is to filter out the columns where the contact_date is in the same month as custody_start or custody_end. So in this case the 1st column and the last column should be filtered out and we should only have data from March til June.

最终输出应该是这样

    person_id|custody_start|custody_end|contact_month|month_start     |month_end |contact_date
    13126321 |02/23/2020   |07/17/2020 |March 20     |03/01/2020      |03/31/2020|03/12/2020    
    13126321 |02/23/2020   |07/17/2020 |April 20     |04/01/2020      |04/30/2020|04/11/2020  
    13126321 |02/23/2020   |07/17/2020 |May 20       |05/01/2020      |05/31/2020|05/12/2020 
    13126321 |02/23/2020   |07/17/2020 |June 20      |06/01/2020      |06/30/2020|06/11/2020 

推荐答案

library(tidyverse)
library(lubridate)

df %>%
  filter(month(contact_date) != month(custody_start))

注意:这要求您的两列采用正确的(或强制的)日期格式.

Note: this requires your two columns being in a proper (or coercible) date format.

更新(基于注释中TO的问题):

UPDATE (based on TO's question in the comments):

是否可以检查数据帧中的记录是否处于托管状态整个日历月?

is there a way to check if a record in the dataframe was in custody for the full calendar month?

df <- data.frame(start = as.Date(c("2/23/2020", "2/1/2020", "2/1/2021", "7/1/1900"), "%m/%d/%Y"),
                 end   = as.Date(c("2/25/2020", "2/28/2020", "2/28/2021", "7/31/1900"),"%m/%d/%Y"))

#        start        end
# 1 2020-02-23 2020-02-25
# 2 2020-02-01 2020-02-28
# 3 2021-02-01 2021-02-28
# 4 1900-07-01 1900-07-31

library(tidyverse)
library(lubridate)

df %>%
  mutate(start_help = start - days(1),
         end_help   = end   + days(1),
         full_month = if_else(month(start) == month(end) &
                              (month(start) == month(start_help) + 1 | month(start) == month(start_help) - 11) &
                              (month(end)   == month(end_help)   - 1 | month(end)   == month(end_help)   + 11),
                              "yes",
                              "no")) %>%
  select(-start_help, -end_help)

#        start        end full_month
# 1 2020-02-23 2020-02-25         no
# 2 2020-02-01 2020-02-28         no
# 3 2021-02-01 2021-02-28        yes
# 4 1900-07-01 1900-07-31        yes

注意:这是一种相对幼稚的方法,即它不会检查开始日期和结束日期的年份是否也相同.但是,从上面的数据来看,您似乎待在一年之内,所以毕竟还可以.

Note: this is a relatively naive approach, i.e. it doesn't check if the year of the start and end date are also the same. However, from your data above, it seems you are staying within one year, so might be fine after all.

这篇关于在R中的多列上进行过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆