两个或多个DataFrame列的交集 [英] Intersection of two or more DataFrame columns

查看:1105
本文介绍了两个或多个DataFrame列的交集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到三个数据帧的交集,但是pd.intersect1d不喜欢使用三个数据帧.

I am trying to find the intersect of three dataframes, however the pd.intersect1d does not like to use three dataframes.

import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('BCDE'))
df3 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('CDEF'))

inclusive_list = np.intersect1d(df1.columns, df2.columns, df3.columns)

错误:

ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

inclusive_list应该仅包括列名称C& D.任何帮助将不胜感激.谢谢你.

The inclusive_list should only include column names C & D. Any help would be appreciated. Thank you.

推荐答案

为什么当前的方法不起作用 :

Why your current approach doesn't work:

intersect1d 确实可以不使用N数组,它仅比较2.

intersect1d does not take N arrays, it only compares 2.

numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)

从定义中可以看到,您将第三个数组作为assume_unique参数传递,并且由于您将数组视为单个布尔值,因此会收到ValueError.

You can see from the definition that you are passing the third array as the assume_unique parameter, and since you are treating an array like a single boolean, you receive a ValueError.

您可以使用functools.reduce扩展intersect1d的功能以在N阵列上工作:

You can extend the functionality of intersect1d to work on N arrays using functools.reduce:

from functools import reduce
reduce(np.intersect1d, (df1.columns, df2.columns, df3.columns))

array(['C', 'D'], dtype=object)


更好的方法


A better approach

但是,最简单的方法是仅在Index对象上使用交集:

However, the easiest approach is to just use intersection on the Index object:

df1.columns & df2.columns & df3.columns

Index(['C', 'D'], dtype='object')

这篇关于两个或多个DataFrame列的交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆