Pandas还是Numpy:如何获取匹配的数据条目以进行数据处理 [英] Pandas or Numpy: how to get matching data entries to do data manipulation

查看：216 发布时间：2020/5/18 21:36:23 python pandas numpy

本文介绍了Pandas还是Numpy:如何获取匹配的数据条目以进行数据处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

例如，说我有两个这样的数据关系:

Say, for example, that I have two data relations like this:

Data1:
   C1      C2
"Peter"  "kiwi"
"John"   "banana"
"Susan"  "peach"
"Joe"    "apple"

Data2:
   C3      C4
"apple"     4 
"banana"    7
"apple"     4

对于data1中的每一行，我想在通用属性上找到所有可能的匹配项(例如，在Data1 [C2]和Data2 [C3]之间)，并对找到的所有匹配项的Data2 [C4]值求和.

For each row in data1, I would like to find all the possible matches on the common attribute (say between Data1[C2] and Data2[C3]) and sum the Data2[C4] values for all matches found.

更具体地来说，我希望计算结果像这样:

More concretely to the example, I want the resulting computation to be like:

"Peter":  0 (no match for "kiwi")
"John":   7 (one match for "banana", it's just 7)
"Susan":  0 (no match for "peach")
"Joe":    8 (two matches for "apple", they're 4+4)

如何有效地使用pandas数据框或numpy做到这一点?

How can I accomplish this using the pandas dataframe, or numpy, efficiently?

非常感谢您的帮助.在获得有关此问题的技术细节的更多信息后，我将对问题标题进行更相关的编辑.

Thank you so much for help. I'll edit the question title to be more relevant after I get more input on the technical details involved with this problem.

一种方法是使用用户`merge`和`groupby`:

data1.merge(data2, left_on='C2', right_on='C3', how='left')[['C1', 'C4']]\
     .fillna(0)\
     .groupby('C1')\
     .sum()

输出:

        C1   C4
0    "Joe"  8.0
1   "John"  7.0
2  "Peter"  0.0
3  "Susan"  0.0

要获取字典输出:

data1.merge(data2, left_on='C2', right_on='C3', how='left')[['C1', 'C4']]\
     .fillna(0)\
     .groupby('C1')\
     .sum()\
     .T\
     .to_dict('r')

输出:

[{'"Joe"': 8.0, '"John"': 7.0, '"Peter"': 0.0, '"Susan"': 0.0}]

另一种方法是将`map`与`sum`一起使用:

Another way is to use `map` with `sum`:

data1['Score'] = data1['C2'].map(data2.set_index('C3', append=True)\
                            .sum(level=1)['C4']).fillna(0)
data1[['C1', 'Score']]

输出:

        C1  Score
0  "Peter"    0.0
1   "John"    7.0
2  "Susan"    0.0
3    "Joe"    8.0

附加评论:

data1.merge(data2.rename_axis('d2_idx').reset_index(), left_on='C2', right_on='C3', how='left')\
     .groupby('C1')['d2_idx','C4']\
     .agg({'d2_idx':lambda x: ', '.join(x.astype(str)), 'C4':'sum'})

输出:

           d2_idx   C4
C1                    
"Joe"    0.0, 2.0  8.0
"John"        1.0  7.0
"Peter"       nan  0.0
"Susan"       nan  0.0

这篇关于Pandas还是Numpy:如何获取匹配的数据条目以进行数据处理的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas还是Numpy:如何获取匹配的数据条目以进行数据处理 [英] Pandas or Numpy: how to get matching data entries to do data manipulation

问题描述

推荐答案

一种方法是使用用户`merge`和`groupby`:

另一种方法是将`map`与`sum`一起使用:

Another way is to use `map` with `sum`:

附加评论:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas还是Numpy:如何获取匹配的数据条目以进行数据处理 [英] Pandas or Numpy: how to get matching data entries to do data manipulation

问题描述

推荐答案

一种方法是使用用户merge和groupby:

另一种方法是将map与sum一起使用:

Another way is to use map with sum:

附加评论:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

一种方法是使用用户`merge`和`groupby`:

另一种方法是将`map`与`sum`一起使用:

Another way is to use `map` with `sum`:

登录关闭