pandas “只能比较标记相同的DataFrame对象".错误 [英] Pandas "Can only compare identically-labeled DataFrame objects" error
问题描述
我正在使用Pandas比较加载到两个数据帧(uat,prod)中的两个文件的输出: ...
I'm using Pandas to compare the outputs of two files loaded into two data frames (uat, prod): ...
uat = uat[['Customer Number','Product']]
prod = prod[['Customer Number','Product']]
print uat['Customer Number'] == prod['Customer Number']
print uat['Product'] == prod['Product']
print uat == prod
The first two match exactly:
74357 True
74356 True
Name: Customer Number, dtype: bool
74357 True
74356 True
Name: Product, dtype: bool
对于第三张印刷品,我收到一个错误: 只能比较标记相同的DataFrame对象.如果前两个比较好,那么第三个有什么问题?
For the third print, I get an error: Can only compare identically-labeled DataFrame objects. If the first two compared fine, what's wrong with the 3rd?
谢谢
推荐答案
下面是一个小例子来说明这一点(仅适用于DataFrames,不适用于Series,直到适用于两者的Pandas 0.19为止):
Here's a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):
In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])
In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])
In [3]: df1 == df2
Exception: Can only compare identically-labeled DataFrame objects
一种解决方案是对索引进行排序首先(注意:某些函数需要排序的索引):
One solution is to sort the index first (Note: some functions require sorted indexes):
In [4]: df2.sort_index(inplace=True)
In [5]: df1 == df2
Out[5]:
0 1
0 True True
1 True True
注意:==
也是对列顺序敏感 a>,因此您可能必须使用sort_index(axis=1)
:
Note: ==
is also sensitive to the order of columns, so you may have to use sort_index(axis=1)
:
In [11]: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1)
Out[11]:
0 1
0 True True
1 True True
注意:这仍然可以提高(如果索引/列在排序后没有相同的标签).
Note: This can still raise (if the index/columns aren't identically labelled after sorting).
这篇关于 pandas “只能比较标记相同的DataFrame对象".错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!