根据另一个值的值从列变量中设置值范围的子集 [英] Subsetting a range of values from a column variable based on the values of another value
本文介绍了根据另一个值的值从列变量中设置值范围的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试图将我所有的行保留在数据框中,但删除不属于表格 8 至少 2 年的表格 2 行.
图书馆(tidyverse)表格 <- data_frame( CASEID = rep(01012,5), VISIT = c(450, 450, 365, 365, 450), FORM = c(18, 8, 7, 2, 2), DTYvisit = c(2006), 2006, 2003, 2003, 2006))>形式 # 小题:5 x 4CASEID 访问表年<dbl><dbl><dbl><dbl>1 1012 450 18 20062 1012 450 8 20063 1012 365 7 20034 1012 365 2 20035 1012 450 2 20046 1013 450 8 20037 1013 450 18 20038 1013 450 2 20039 1012 450 2 2009
关于如何删除不属于 < 的 FORM 2 行的任何建议FORM 8 DTyvisit 的 2 年范围?
效果很好:
form2.matchedOnForm8 <- forms %>% group_by(CASEID) %>% filter(FORM == 8) %>% select(CASEID, VISIT, DTYvisit) %>% left_join(filter(forms, FORM == 2), by = c("CASEID", "VISIT", "DTYvisit")) %>% bind_rows(filter(forms, FORM != 2))
但现在我正在失去观察.
我需要以下内容:
图书馆(tidyverse)表格 <- data_frame( CASEID = rep(01012,5), VISIT = c(450, 450, 365, 365, 450), FORM = c(18, 8, 7, 2, 2), DTYvisit = c(2006), 2006, 2003, 2003, 2006))>形式 # 小题:5 x 4CASEID 访问表年<dbl><dbl><dbl><dbl>1 1012 450 18 20062 1012 450 8 20063 1012 365 7 20034 1012 450 2 20045 1013 450 8 20036 1013 450 18 20037 1013 450 2 2003
解决方案
这是一个使用 outer
计算给定 YEAR 与 FROM 8 可能具有的所有 YEAR 值之间的差异的解决方案.
>
min(abs(as.numeric(outer(df[df$FORM==8,'YEAR'],df[1,'YEAR'],'-'))))[1] 0df$diff <- apply(df, 1, function(x) min(as.numeric(outer(df[df$FORM==8,'YEAR',drop=TRUE],as.numeric(x['YEAR)']),'-'))))图书馆(dplyr)df %>% group_by(CASEID) %>%filter(!(FORM==2 & abs(diff)>2))df <- read.table(text="CASEID 访问表年1 1012 450 18 20062 1012 450 8 20063 1012 365 7 20034 1012 365 2 20035 1012 450 2 20046 1013 450 8 20037 1013 450 18 20038 1013 450 2 20039 1012 450 2 2009",header=T,stringsAsFactors = F)
I am trying to keep all my rows in the dataframe but DROP the form 2 rows that do not fall with a minimum of 2 years of form 8.
library(tidyverse)
forms <- data_frame( CASEID = rep(01012,5), VISIT = c(450, 450, 365, 365, 450), FORM = c(18, 8, 7, 2, 2), DTYvisit = c(2006, 2006, 2003, 2003, 2006) )
> forms # A tibble: 5 x 4
CASEID VISIT FORM YEAR
<dbl> <dbl> <dbl> <dbl>
1 1012 450 18 2006
2 1012 450 8 2006
3 1012 365 7 2003
4 1012 365 2 2003
5 1012 450 2 2004
6 1013 450 8 2003
7 1013 450 18 2003
8 1013 450 2 2003
9 1012 450 2 2009
Any suggestions on how I could drop rows of FORM 2 that do not fall within a < 2 year range of the FORM 8 DTyvisit?
This worked great:
form2.matchedOnForm8 <- forms %>% group_by(CASEID) %>% filter(FORM == 8) %>% select(CASEID, VISIT, DTYvisit) %>% left_join(filter(forms, FORM == 2), by = c("CASEID", "VISIT", "DTYvisit")) %>% bind_rows(filter(forms, FORM != 2))
but now I am losing observations.
I need the following:
library(tidyverse)
forms <- data_frame( CASEID = rep(01012,5), VISIT = c(450, 450, 365, 365, 450), FORM = c(18, 8, 7, 2, 2), DTYvisit = c(2006, 2006, 2003, 2003, 2006) )
> forms # A tibble: 5 x 4
CASEID VISIT FORM YEAR
<dbl> <dbl> <dbl> <dbl>
1 1012 450 18 2006
2 1012 450 8 2006
3 1012 365 7 2003
4 1012 450 2 2004
5 1013 450 8 2003
6 1013 450 18 2003
7 1013 450 2 2003
解决方案
Here is a solution using outer
to calculate the difference between the given YEAR and all YEAR values that FROM 8 might have.
min(abs(as.numeric(outer(df[df$FORM==8,'YEAR'],df[1,'YEAR'],'-'))))
[1] 0
df$diff <- apply(df, 1, function(x) min(as.numeric(outer(df[df$FORM==8,'YEAR',drop=TRUE],as.numeric(x['YEAR']),'-'))))
library(dplyr)
df %>% group_by(CASEID) %>%
filter(!(FORM==2 & abs(diff)>2))
df <- read.table(text="
CASEID VISIT FORM YEAR
1 1012 450 18 2006
2 1012 450 8 2006
3 1012 365 7 2003
4 1012 365 2 2003
5 1012 450 2 2004
6 1013 450 8 2003
7 1013 450 18 2003
8 1013 450 2 2003
9 1012 450 2 2009
",header=T, stringsAsFactors = F)
这篇关于根据另一个值的值从列变量中设置值范围的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文