比较两个相似的数据帧,并在它们之间找到不同的值 [英] Comparing two similar dataframes and finding different values between them

查看:195
本文介绍了比较两个相似的数据帧,并在它们之间找到不同的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个看似基本的问题,如果这是一个重复的问题,我提前道歉。我环顾四周,没有看到任何东西。



我有两个数据框满满的字符串。我想看看他们是否是彼此的EXACT重复。



如果不是,我想确定哪些值不同。



具体来说,考虑到这个数据框:

  x | y | 
| --- | --- |
| a | e |
| b | f |
| c | g |
| d | h |

此数据框:

  | x | y | 
| --- | --- |
| a | l |
| b | m |
| j | g |
| k | h |

我想生成这个结果(一个充满不匹配值的df):

  | x | y | 
| --- | --- |
| | l |
| | m |
| j | |
| k | |

这个问题非常接近我的想法,但是它想要找到完全相同的,而不是值。 / p>

1)我不认为除了通过字符串匹配进行测试之外,还可以逐个迭代遍历每个值。我知道这个 df1%in%df2 将测试行。但是如何测试每个元素?我可以测试每个元素后,需要构建一个数据框来存储非匹配项。我不知道该怎么做



这似乎是一个简单的想法,但是将其打破,实际上似乎相当复杂。



我的资料:

  df1 < -  data.frame(
x = c('a','b','c','d'),
y = c('e','f' 'g','h')



df2< - data.frame(
x = c('a','b','j ','k'),
y = c('l','m','g','h')


解决方案

您可以:

  df2 [mapply(function(x,y)x%in%y,df1,df2)]< -NA 
xy
1< NA> l
2< NA> m
3 j< NA>
4 k

直接影响 df2



说明:

mapply()用于具有<在$ df1 和 df2 的第一列之间应用code>%% ,然后第二个等等,如果有更多的。

这给出:

 > mapply(function(x,y)x%in%y,df1,df2)
xy
[1,] TRUE FALSE
[2,] TRUE FALSE
[3,] FALSE TRUE
[4,] FALSE TRUE

code>是匹配的值,这些是我们想要更改为 NA的


This is a seemingly basic question, I apologize in advance if this is a duplicate question. I looked around and didn't see anything.

I have two dataframes full of strings. I'd like to see if they are EXACT duplicates of each other.

If they are not, I'd like to determine which values are different.

Specifically, given this dataframe:

| x | y |
|---|---|
| a | e |
| b | f |
| c | g |
| d | h |

and this dataframe:

| x | y |
|---|---|
| a | l |
| b | m |
| j | g |
| k | h |

I would like to generate this result (a df full of non-matching values):

| x | y |
|---|---|
|   | l |
|   | m |
| j |   |
| k |   |

This question is super close to what I'm thinking, but it wants to find full rows that are the same, not values.

1) I don't think I have any choice other than to iterate across each value, one by one, testing via string matching. I know this df1 %in% df2 will test for rows. But how do I test for each element?

2) After I can test each element, I'd need to construct a dataframe to store the non-matches. I'm not sure how to do it.

It seems like a simple idea, but breaking it down, the implementation actually seems rather complex. Any bumps in the right direction would be greatly appreciated.

My data:

df1 <- data.frame(
  x = c('a', 'b', 'c', 'd'),
  y = c('e', 'f', 'g', 'h')
)


df2 <- data.frame(
  x = c('a', 'b', 'j', 'k'),
  y = c('l', 'm', 'g', 'h')
)

解决方案

You could do:

df2[mapply(function(x,y)   x%in%y ,df1,df2)]<-NA
     x    y
1 <NA>    l
2 <NA>    m
3    j <NA>
4    k <NA>

This affects df2 directly, better have a copy of it.

Explanation:
mapply() is used to have the %in% applied between the first column of df1 and df2, and then the second and so on if there were more.
This gives:

> mapply(function(x,y)   x%in%y,df1,df2)
         x     y
[1,]  TRUE FALSE
[2,]  TRUE FALSE
[3,] FALSE  TRUE
[4,] FALSE  TRUE

TRUE are the values that matched, these are the want we want to change into NA's.

这篇关于比较两个相似的数据帧,并在它们之间找到不同的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆