如何检查两个数据帧是否相等 [英] How to check if two data frames are equal

查看:96
本文介绍了如何检查两个数据帧是否相等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我在R中有大型数据集,我只想知道它们中的两个是否相同。当我尝试不同的算法以达到相同的结果时,我经常使用它。例如,假设我们具有以下数据集:

Say I have large datasets in R and I just want to know whether two of them they are the same. I use this often when I'm experimenting different algorithms to achieve the same result. For example, say we have the following datasets:

df1 <- data.frame(num = 1:5, let = letters[1:5])
df2 <- df1
df3 <- data.frame(num = c(1:5, NA), let = letters[1:6])
df4 <- df3

这就是我要做的比较:

table(x == y, useNA = 'ifany')

在数据集没有NA的情况下,效果很好:

Which works great when the datasets have no NAs:

> table(df1 == df2, useNA = 'ifany')
TRUE 
  10 

但是当它们具有NA时,不是很多:

But not so much when they have NAs:

> table(df3 == df4, useNA = 'ifany')
TRUE <NA> 
  11    1 

在此示例中,很容易忽略 NA 不是问题,因为我们知道两个数据帧 相等。问题是 NA ==< anything> 会产生 NA ,因此只要其中一个数据集具有 NA ,不管对方在同一位置上拥有什么,结果始终是 NA

In the example, it's easy to dismiss the NA as not a problem since we know that both dataframes are equal. The problem is that NA == <anything> yields NA, so whenever one of the datasets has an NA, it doesn't matter what the other one has on that same position, the result is always going to be NA.

因此,使用 table()比较数据集对我来说并不理想。 如何更好地检查两个数据框是否相同?

So using table() to compare datasets doesn't seem ideal to me. How can I better check if two data frames are identical?

PS:请注意,这不是R-比较多个数据集比较R 中的2个数据集或比较R

P.S.: Note this is not a duplicate of R - comparing several datasets, Comparing 2 datasets in R or Compare datasets in R

推荐答案

查找all.equal。

Look up all.equal. It has some riders but it might work for you.

all.equal(df3,df4)
# [1] TRUE
all.equal(df2,df1)
# [1] TRUE

这篇关于如何检查两个数据帧是否相等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆