Pandas还是Numpy:如何获取匹配的数据条目以进行数据处理 [英] Pandas or Numpy: how to get matching data entries to do data manipulation
问题描述
例如,说我有两个这样的数据关系:
Say, for example, that I have two data relations like this:
Data1:
C1 C2
"Peter" "kiwi"
"John" "banana"
"Susan" "peach"
"Joe" "apple"
Data2:
C3 C4
"apple" 4
"banana" 7
"apple" 4
对于data1中的每一行,我想在通用属性上找到所有可能的匹配项(例如,在Data1 [C2]和Data2 [C3]之间),并对找到的所有匹配项的Data2 [C4]值求和.
For each row in data1, I would like to find all the possible matches on the common attribute (say between Data1[C2] and Data2[C3]) and sum the Data2[C4] values for all matches found.
更具体地来说,我希望计算结果像这样:
More concretely to the example, I want the resulting computation to be like:
"Peter": 0 (no match for "kiwi")
"John": 7 (one match for "banana", it's just 7)
"Susan": 0 (no match for "peach")
"Joe": 8 (two matches for "apple", they're 4+4)
如何有效地使用pandas数据框或numpy做到这一点?
How can I accomplish this using the pandas dataframe, or numpy, efficiently?
非常感谢您的帮助.在获得有关此问题的技术细节的更多信息后,我将对问题标题进行更相关的编辑.
Thank you so much for help. I'll edit the question title to be more relevant after I get more input on the technical details involved with this problem.
推荐答案
一种方法是使用用户merge
和groupby
:
data1.merge(data2, left_on='C2', right_on='C3', how='left')[['C1', 'C4']]\
.fillna(0)\
.groupby('C1')\
.sum()
输出:
C1 C4
0 "Joe" 8.0
1 "John" 7.0
2 "Peter" 0.0
3 "Susan" 0.0
要获取字典输出:
data1.merge(data2, left_on='C2', right_on='C3', how='left')[['C1', 'C4']]\
.fillna(0)\
.groupby('C1')\
.sum()\
.T\
.to_dict('r')
输出:
[{'"Joe"': 8.0, '"John"': 7.0, '"Peter"': 0.0, '"Susan"': 0.0}]
另一种方法是将map
与sum
一起使用:
Another way is to use map
with sum
:
data1['Score'] = data1['C2'].map(data2.set_index('C3', append=True)\
.sum(level=1)['C4']).fillna(0)
data1[['C1', 'Score']]
输出:
C1 Score
0 "Peter" 0.0
1 "John" 7.0
2 "Susan" 0.0
3 "Joe" 8.0
附加评论:
data1.merge(data2.rename_axis('d2_idx').reset_index(), left_on='C2', right_on='C3', how='left')\
.groupby('C1')['d2_idx','C4']\
.agg({'d2_idx':lambda x: ', '.join(x.astype(str)), 'C4':'sum'})
输出:
d2_idx C4
C1
"Joe" 0.0, 2.0 8.0
"John" 1.0 7.0
"Peter" nan 0.0
"Susan" nan 0.0
这篇关于Pandas还是Numpy:如何获取匹配的数据条目以进行数据处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!