如何检查两个数据帧是否相等 [英] How to check if two data frames are equal
问题描述
说我在R中有大型数据集,我只想知道它们中的两个是否相同。当我尝试不同的算法以达到相同的结果时,我经常使用它。例如,假设我们具有以下数据集:
Say I have large datasets in R and I just want to know whether two of them they are the same. I use this often when I'm experimenting different algorithms to achieve the same result. For example, say we have the following datasets:
df1 <- data.frame(num = 1:5, let = letters[1:5])
df2 <- df1
df3 <- data.frame(num = c(1:5, NA), let = letters[1:6])
df4 <- df3
这就是我要做的比较:
table(x == y, useNA = 'ifany')
在数据集没有NA的情况下,效果很好:
Which works great when the datasets have no NAs:
> table(df1 == df2, useNA = 'ifany')
TRUE
10
但是当它们具有NA时,不是很多:
But not so much when they have NAs:
> table(df3 == df4, useNA = 'ifany')
TRUE <NA>
11 1
在此示例中,很容易忽略 NA
不是问题,因为我们知道两个数据帧 相等。问题是 NA ==< anything>
会产生 NA
,因此只要其中一个数据集具有 NA
,不管对方在同一位置上拥有什么,结果始终是 NA
。
In the example, it's easy to dismiss the NA
as not a problem since we know that both dataframes are equal. The problem is that NA == <anything>
yields NA
, so whenever one of the datasets has an NA
, it doesn't matter what the other one has on that same position, the result is always going to be NA
.
因此,使用 table()
比较数据集对我来说并不理想。 如何更好地检查两个数据框是否相同?
So using table()
to compare datasets doesn't seem ideal to me. How can I better check if two data frames are identical?
PS:请注意,这不是R-比较多个数据集,比较R 中的2个数据集或比较R
P.S.: Note this is not a duplicate of R - comparing several datasets, Comparing 2 datasets in R or Compare datasets in R
推荐答案
查找all.equal。
Look up all.equal. It has some riders but it might work for you.
all.equal(df3,df4)
# [1] TRUE
all.equal(df2,df1)
# [1] TRUE
这篇关于如何检查两个数据帧是否相等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!