在两个数据帧之间找到相等的列 [英] Find equal columns between two dataframes
问题描述
我有两个 pandas
数据框,a
和 b
:
I have two pandas
data frames, a
and b
:
a1 a2 a3 a4 a5 a6 a7
1 3 4 5 3 4 5
0 2 0 3 0 2 1
2 5 6 5 2 1 2
和
b1 b2 b3 b4 b5 b6 b7
3 5 4 5 1 4 3
0 1 2 3 0 0 2
2 2 1 5 2 6 5
这两个数据框包含完全相同的数据,但顺序不同,列名也不同.根据两个数据框中的数字,我希望能够将 a
中的每个列名与 b
中的每个列名相匹配.
The two data frames contain exactly the same data, but in a different order and with different column names. Based on the numbers in the two data frames, I would like to be able to match each column name in a
to each column name in b
.
这并不像简单地将 a
的第一行与 b
的第一行进行比较那么容易,因为存在重复的值,例如 a4
和 a7
的值为 5
,因此不可能立即将它们与 b2
或 b4
匹配>.
It is not as easy as simply comparing the first row of a
with the first row of b
as there are duplicated values, for example both a4
and a7
have the value 5
so it is not possible to immediately match them to either b2
or b4
.
最好的方法是什么?
推荐答案
这是利用 广播 以检查两个数据帧之间的相等性并采用 all
以检查所有行匹配的位置.然后我们可以从 <的结果中获取两个数据框列名的索引数组code>np.where(与@piR 的贡献):
Here's one way leveraging broadcasting to check for equality between both dataframes and taking all
on the result to check where all rows match. Then we can obtain indexing arrays for both dataframe's column names from the result of np.where
(with @piR's contribution):
i, j = np.where((a.values[:,None] == b.values[:,:,None]).all(axis=0))
dict(zip(a.columns[j], b.columns[i]))
# {'a7': 'b2', 'a6': 'b3', 'a4': 'b4', 'a2': 'b7'}
这篇关于在两个数据帧之间找到相等的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!