删除数据帧中R中另一个数据帧中不存在的行 [英] Deleting rows from a data frame that are not present in another data frame in R
问题描述
我是R的新手,但从我一直在阅读的内容来看,这对我来说有点难.我有两个数据框,例如DF1和DF2,它们都有一个感兴趣的变量,例如idFriends,我想创建一个新的数据框,其中所有DF2中未出现的行都基于这些值从DF1中删除idFriends.
I'm new to R but from what I've been reading this one is a bit hard for me. I have two data frames, say DF1 and DF2, both of which have a variable of interest, say idFriends, and I want to create a new data frame where all the rows that do not appear in DF2 are deleted from DF1 based on the values of idFriends.
问题是,在DF2中,每个值仅出现一次,而DF1有成千上万个值,其中许多重复.但是我不希望R删除重复项,我只希望它搜索DF2,看看DF2中是否存在DF1的EACH值,如果不存在,请删除该行,如果存在则将其保留原样,然后执行DF1中的每一行都相同.
The thing is that in DF2 each value appears only once while DF1 has thousands of values, many of them repeated. BUT I don't want R to delete repetitions, I just want it to search DF2, see if EACH value of DF1 exists in DF2, and if it doesn't exist delete that row and if it exists leave it as is, and do the same for each row in DF1.
我希望这很清楚.
推荐答案
dplyr
具有执行该操作的semi_join
函数.
dplyr
has an semi_join
function that does that.
DF1 %>% semi_join(DF2, by = "idFriends") # keep rows with matching ID
DF1 %>% anti_join(DF2, by = "idFriends") # keep rows without matching ID
这篇关于删除数据帧中R中另一个数据帧中不存在的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!