如何找到两个Pandas DataFrame之间的设置差异 [英] How to find the set difference between two Pandas DataFrames

查看:300
本文介绍了如何找到两个Pandas DataFrame之间的设置差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想检查两个DataFrame列之间的区别.我尝试使用以下命令:

I'd like to check the difference between two DataFrame columns. I tried using the command:

np.setdiff1d(train.columns, train_1.columns)

这将导致一个空数组:

array([], dtype=object)

但是,数据框中的列数不同:

However, the number of columns in the dataframes are different:

len(train.columns), len(train_1.columns) = (51, 56)

这意味着两个DataFrame明显不同.

which means that the two DataFrame are obviously different.

这是怎么了?

推荐答案

结果正确,但是setdiff1d与订单有关.它将仅检查第二个数组中未出现的第一个输入数组中的元素.

The results are correct, however, setdiff1d is order dependent. It will only check for elements in the first input array that do not occur in the second array.

如果您不在乎哪个数据框具有唯一列,则可以使用setxor1d.它将返回仅在输入数组之一(不是两个)中的唯一值",请参见

If you do not care which of the dataframes have the unique columns you can use setxor1d. It will return "the unique values that are in only one (not both) of the input arrays", see the documentation.

import numpy

colsA = ['a', 'b', 'c', 'd']
colsB = ['b','c']

c = numpy.setxor1d(colsA, colsB)

将为您返回一个包含'a'和'd'的数组.

Will return you an array containing 'a' and 'd'.

如果要使用setdiff1d,则需要两种方式检查差异:

If you want to use setdiff1d you need to check for differences both ways:

//columns in train.columns that are not in train_1.columns
c1 = np.setdiff1d(train.columns, train_1.columns)

//columns in train_1.columns that are not in train.columns
c2 = np.setdiff1d(train_1.columns, train.columns)

这篇关于如何找到两个Pandas DataFrame之间的设置差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆