通过匹配另一列来过滤一列 [英] Filter one column by matching to another column

查看:55
本文介绍了通过匹配另一列来过滤一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含一个变量,其中包含与其他变量中的元素匹配时要删除的元素-请参见下面的一个小示例:

I have a data frame with a variable containing elements to drop if they match to an element in another variable - see a small example below:

df <- data.frame(pair = c(1, 1, 2, 2, 3, 3),
                 animal = rep(c("dog", "cat"), 3), 
                 value = seq(1, 12, 2), 
                 drop = c("no", "no", "dog", "dog", "cat", "cat"))

  pair animal value drop
1    1    dog     1   no
2    1    cat     3   no
3    2    dog     5  dog
4    2    cat     7  dog
5    3    dog     9  cat
6    3    cat    11  cat

我试图根据 animal 的值是否与 drop 的值匹配来过滤数据帧.我想要类似 filter(df,animal!= drop)之类的东西来删除仅animal值与drop值匹配的行:

I'm trying to want to filter the data frame according to whether the value of animal matches the value of drop. I want something like filter(df, animal != drop) to remove rows where only the value of animal matches the value of drop:

  pair animal value drop
1    1    dog     1   no
2    1    cat     3   no
4    2    cat     7  dog
5    3    dog     9  cat

我还尝试编写一个简单的循环来测试动物匹配是否每行都下降,如果为true,则删除该行,但是我无法使其正常工作.(我对循环不是很自信,并且如果可能的话,我不愿使用一个循环,因为我的数据帧很大,但是我感到绝望了!)

I also tried writing a simple loop to test whether animal matches drop for each row and remove the row if true, but I couldn't get it working. (I'm not very confident with loops and would prefer not to use one if possible as my data frame is very large but I was getting desperate!)

for(i in nrow(df)){
  if(df$animal[i] == df$drop[i]){
    df <- df[-i,]
    return(df)
  }
}

是否有使用dplyr做到这一点的方法?

Is there a way of doing this using dplyr?

推荐答案

使用 filter(df,animal!= drop)是正确的.但是,由于未在 data.frame()调用中指定 stringsAsFactors = F ,因此所有字符串都将转换为因数,从而增加了不同级别集的错误.因此,添加 stringsAsFactors = F 应该可以解决此问题

The use of filter(df, animal != drop) is correct. However, as you haven't specified stringsAsFactors = F in your data.frame() call, all strings are converted to factors, raising the error of different level sets. Thus adding stringsAsFactors = F, should solve this

df <- data.frame(pair = c(1, 1, 2, 2, 3, 3),
                 animal = rep(c("dog", "cat"), 3), 
                 value = seq(1, 12, 2), 
                 drop = c("no", "no", "dog", "dog", "cat", "cat"),
                 stringsAsFactors = F) 

df %>%
  filter(animal != drop)

  pair animal value drop
1    1    dog     1   no
2    1    cat     3   no
3    2    cat     7  dog
4    3    dog     9  cat

为避免使用此不希望出现的字符串来影响行为,我强烈建议使用 tibble

To avoid issues with this undesired string to factor behaviour I highly recommend the use of tibble

如果没有机会更改数据的创建方式,我在这里提供@akrun的解决方案:

In case that one does not have the opportunity to change how the data is created I here include @akrun's solution:

library(dplyr)

df %>% 
  mutate_at(vars(animal, drop), as.character) %>%       
  filter(animal != drop)
#  pair animal value drop
#1    1    dog     1   no
#2    1    cat     3   no
#3    2    cat     7  dog
#4    3    dog     9  cat

这篇关于通过匹配另一列来过滤一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆