两个数据集中的匹配ID [英] Matching IDs in two datasets

查看:94
本文介绍了两个数据集中的匹配ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两组数据,包括前置和后置数据.受访者具有唯一的ID,我想创建一个子集,其中仅包括对两项调查均做出答复的受访者. 示例数据集:

pre.data <- data.frame(ID = c(1:10), Y = sample(c("yes", "no"), 10, replace = TRUE),
  Survey = 1)

post.data <- data.frame(ID = c(1:3,6:10), Y = sample(c("yes", "no"), 8, replace = TRUE),
  Survey = 2)

all.data <- rbind(pre.data, post.data)

我具有以下功能:

match <- function(dat1, dat2, dat3){  #dat1 is whole dataset(both stitched together) 
  #dat2 is pre dataset #dat3 is post dataset
  selectedRows <- (dat1$ID %in% dat2$ID & 
                     dat1$ID %in% dat3$ID)

  matchdata <- dat1[selectedRows,]
  return(matchdata)
}

prepost.match.data <- match(all.data, pre.data, post.data)

我认为必须有比此功能做一个更好的方法,但是我不知道怎么做.我的操作方式似乎有些混乱.我的意思是,它可以正常工作-它可以实现我想要的功能,但是我不禁想到还有更好的方法.

我很抱歉,是否已经以类似的方式被问到了,但我找不到它-在这种情况下,请务必指出相关的答案.

解决方案

注意:阿伦(Arun)在评论中比我提早发布了相同的答案.

您可以像这样使用intersect:

all.data[all.data$ID %in% intersect(pre.data$ID, post.data$ID),]

哪个给:

   ID   Y Survey
1   1 yes      1
2   2  no      1
3   3  no      1
6   6 yes      1
7   7 yes      1
8   8 yes      1
9   9  no      1
10 10 yes      1
11  1  no      2
12  2 yes      2
13  3  no      2
14  6  no      2
15  7 yes      2
16  8 yes      2
17  9  no      2
18 10 yes      2

I have two sets of data, comprising pre and a post data. Respondents have unique IDs, and I want to create a subset which includes only those who responded to both surveys. Example dataset:

pre.data <- data.frame(ID = c(1:10), Y = sample(c("yes", "no"), 10, replace = TRUE),
  Survey = 1)

post.data <- data.frame(ID = c(1:3,6:10), Y = sample(c("yes", "no"), 8, replace = TRUE),
  Survey = 2)

all.data <- rbind(pre.data, post.data)

I have the following function:

match <- function(dat1, dat2, dat3){  #dat1 is whole dataset(both stitched together) 
  #dat2 is pre dataset #dat3 is post dataset
  selectedRows <- (dat1$ID %in% dat2$ID & 
                     dat1$ID %in% dat3$ID)

  matchdata <- dat1[selectedRows,]
  return(matchdata)
}

prepost.match.data <- match(all.data, pre.data, post.data)

I think there must be a better way than this function of doing the same thing, but I cannot think how. How I have done it seems a bit messy. I mean, it works - it does what I want it to, but I can't help thinking there's a better way.

My apologies if this has already been asked in a similar way but I was unable to find it - in which case please do point me towards a relevant answer.

解决方案

Note : Arun posted the same answer in a comment a bit earlier than me.

You can use intersect like this :

all.data[all.data$ID %in% intersect(pre.data$ID, post.data$ID),]

Which gives :

   ID   Y Survey
1   1 yes      1
2   2  no      1
3   3  no      1
6   6 yes      1
7   7 yes      1
8   8 yes      1
9   9  no      1
10 10 yes      1
11  1  no      2
12  2 yes      2
13  3  no      2
14  6  no      2
15  7 yes      2
16  8 yes      2
17  9  no      2
18 10 yes      2

这篇关于两个数据集中的匹配ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆