pandas 比较两个数据框并确定匹配的值 [英] Pandas Compare two dataframes and determine the matched values
本文介绍了 pandas 比较两个数据框并确定匹配的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下数据框:
print(dfa)
ID Value
AA12 101 BB101 CC01 DE06 1
AA11 102 BB101 CC01 234 EE07 2
AA10 202 BB101 CC01 345 EE09 3
AA13 103 BB101 CC02 123 4
AA14 203 BB101 CC02 456 5
AA15 204 BB102 CC03 567 6
print(dfb)
ID Value
AA10 202 BB101 CC01 EE09 345 3
AA11 102 BB101 CC01 EE07 234 2
AA12 101 BB101 CC01 DE06 1
AA13 103 BB101 CC02 123 4
AA18 203 BB103 CC01 456 5
AA15 204 BB201 CC11 678 7
我想将(dfa.ID,dfa.Value)中的字符串与(dfb。 ID,dfb.Value)。如果它们完全匹配(即使子字符串的顺序不同),我也想在数据框 dfa的新 ID Match?和 Value Match?列上打印 Yes。
I would like to compare the string in (dfa.ID, dfa.Value) to the one in (dfb.ID, dfb.Value). If they match exactly (even when the order of substring is not identical), I would like to print 'Yes' on new 'ID Matched?' and 'Value Matched?' columns in dataframe 'dfa'.
所需的输出将是:
ID Value ID Matched? Value Matched?
AA12 101 BB101 CC01 DE06 1 Yes Yes
AA11 102 BB101 CC01 234 EE07 2 Yes Yes
AA10 202 BB101 CC01 345 EE09 3 Yes Yes
AA13 103 BB101 CC02 123 4 Yes Yes
AA14 203 BB101 CC02 456 5 No Yes
AA15 204 BB102 CC03 567 6 No No
推荐答案
您可以执行以下操作:
In [40]: pd.merge(a.assign(x=a.ID.str.split().apply(sorted).str.join(' ')),
...: b.assign(x=b.ID.str.split().apply(sorted).str.join(' ')),
...: on=['x','Value'],
...: how='outer',
...: indicator=True)
...:
Out[40]:
ID_x Value x \
0 AA12 101 BB101 CC01 DE06 1 101 AA12 BB101 CC01 DE06
1 AA11 102 BB101 CC01 234 EE07 2 102 234 AA11 BB101 CC01 EE07
2 AA10 202 BB101 CC01 345 EE09 3 202 345 AA10 BB101 CC01 EE09
3 AA13 103 BB101 CC02 123 4 103 123 AA13 BB101 CC02
4 AA14 203 BB101 CC02 456 5 203 456 AA14 BB101 CC02
5 AA15 204 BB102 CC03 567 6 204 567 AA15 BB102 CC03
6 NaN 5 203 456 AA18 BB103 CC01
7 NaN 7 204 678 AA15 BB201 CC11
ID_y _merge
0 AA12 101 BB101 CC01 DE06 both
1 AA11 102 BB101 CC01 EE07 234 both
2 AA10 202 BB101 CC01 EE09 345 both
3 AA13 103 BB101 CC02 123 both
4 NaN left_only
5 NaN left_only
6 AA18 203 BB103 CC01 456 right_only
7 AA15 204 BB201 CC11 678 right_only
说明:
In [43]: a.ID.str.split()
Out[43]:
0 [AA12, 101, BB101, CC01, DE06]
1 [AA11, 102, BB101, CC01, 234, EE07]
2 [AA10, 202, BB101, CC01, 345, EE09]
3 [AA13, 103, BB101, CC02, 123]
4 [AA14, 203, BB101, CC02, 456]
5 [AA15, 204, BB102, CC03, 567]
Name: ID, dtype: object
In [44]: a.ID.str.split().apply(sorted)
Out[44]:
0 [101, AA12, BB101, CC01, DE06]
1 [102, 234, AA11, BB101, CC01, EE07]
2 [202, 345, AA10, BB101, CC01, EE09]
3 [103, 123, AA13, BB101, CC02]
4 [203, 456, AA14, BB101, CC02]
5 [204, 567, AA15, BB102, CC03]
Name: ID, dtype: object
In [45]: a.assign(x=a.ID.str.split().apply(sorted).str.join(' '))
Out[45]:
ID Value x
0 AA12 101 BB101 CC01 DE06 1 101 AA12 BB101 CC01 DE06
1 AA11 102 BB101 CC01 234 EE07 2 102 234 AA11 BB101 CC01 EE07
2 AA10 202 BB101 CC01 345 EE09 3 202 345 AA10 BB101 CC01 EE09
3 AA13 103 BB101 CC02 123 4 103 123 AA13 BB101 CC02
4 AA14 203 BB101 CC02 456 5 203 456 AA14 BB101 CC02
5 AA15 204 BB102 CC03 567 6 204 567 AA15 BB102 CC03
这篇关于 pandas 比较两个数据框并确定匹配的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文