pandas 多列交叉 [英] Pandas multiple column intersection
本文介绍了 pandas 多列交叉的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个如下的数据框:
I have a data frame as follows:
data={'NAME':['JOHN','MARY','CHARLIE'],
'A':[[1,2,3],[2,3,4],[3,4,5]],
'B':[[2,3,4],[3,4,5],[4,5,6]],
'C':[[2,4],[3,4],[6,7]] }
df=pd.DataFrame(data)
df=df[['NAME','A','B','C']]
NAME A B C
0 JOHN [1, 2, 3] [2, 3, 4] [2, 4]
1 MARY [2, 3, 4] [3, 4, 5] [3, 4]
2 CHARLIE [3, 4, 5] [4, 5, 6] [6, 7]
我需要所有列 A、B、C 的交集.
I need intersection of all columns A, B, C.
我尝试了以下代码但没有奏效:
I tried following code but did not work:
df['D']=list(set(df['A'])&set(df['B'])&set(df['C']))
需要的输出如下:
NAME A B C D
0 JOHN [1, 2, 3] [2, 3, 4] [2, 4] [2]
1 MARY [2, 3, 4] [3, 4, 5] [3, 4] [3, 4]
2 CHARLIE [3, 4, 5] [4, 5, 6] [6, 7] []
推荐答案
选项 1:
交集语法 set(A)&set(B)
.. 是正确的,但您需要稍微调整一下以适用于数据帧,如下所示:
option 1:
The intersection syntax set(A)&set(B)
.. is correct but you need to tweak it a bit to be applicable on a dataframe as follows:
df.assign(D=df.transform(
lambda x: list(set(x.A)&set(x.B)&set(x.C)),
axis=1))
您可以进行如下操作:
df.assign(D=df.transform(
lambda x: list(set(x.A).intersection(set(x.B)).intersection(set(x.C))),
axis=1))
或
df.assign(D=df.apply(
lambda x: list(set(x.A).intersection(set(x.B)).intersection(set(x.C))),
axis=1))
选项 3:
df.assign(D=df.transform(
lambda x: list(reduce(set.intersection, map(set,x.tolist()[1:]))),
axis=1))
<小时>
它的作用是:
What this does is:
- 对每一行使用
set(x.A).intersection(set(x.B))..
通过链获取交集 - 将结果转换为列表
- 对数据框中的每一行执行此操作
执行细节:
In [76]: df.assign(D=df.transform(
...: lambda x: list(set(x.A).intersection(set(x.B)).intersection(set(x.C))),
...: axis=1))
Out[76]:
NAME A B C D
0 JOHN [1, 2, 3] [2, 3, 4] [2, 4] [2]
1 MARY [2, 3, 4] [3, 4, 5] [3, 4] [3, 4]
2 CHARLIE [3, 4, 5] [4, 5, 6] [6, 7] []
这篇关于 pandas 多列交叉的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文