pandas :自定义分组依据功能 [英] Pandas: Custom group-by function

查看：61 发布时间：2021/5/13 19:47:45 python pandas group-by

本文介绍了 pandas :自定义分组依据功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一个自定义的分组依据功能，该功能将对行进行分组，以便于:

I am looking for a custom group-by function that is going to group the rows in a way such that:

如果有任何数字和0，它将添加该数字.
如果有两个数字(它们始终都相同)，则它将相加.
如果存在一个NaN和一个NaN，它将添加一个NaN.
如果有一个数字和一个NaN，它将添加该数字.

一个使事情更清楚的例子:

An example to make things more clear:

start_df = pd.DataFrame({"id": [1,1,2,2,3,3,4,4,4,5],
                         "foo": [4, 4, np.nan, 7, np.nan, np.nan, 0, 9, 9, 7],
                         "bar": [np.nan, np.nan, 0, 4, 0, 1, 6, 6, 0, 4]})

    id  foo  bar
0   1   4.0  NaN
1   1   4.0  NaN
2   2   NaN  0.0
3   2   7.0  4.0
4   3   NaN  0.0
5   3   NaN  1.0
6   4   0.0  6.0
7   4   9.0  6.0
8   4   9.0  0.0
9   5   7.0  4.0

根据 id 进行自定义分组后:

After the custom group-by by id:

result_df = pd.DataFrame({"id": [1,2,3,4,5], "foo": [4, 7, np.nan, 9, 7], "bar": [np.nan, 4, 1, 6, 4]})


    id  foo  bar
0   1   4.0  NaN
1   2   7.0  4.0
2   3   NaN  1.0
3   4   9.0  6.0
4   5   7.0  4.0

我知道的一个解决方案是:

One solution that I am aware of is:

start_df.groupby("id").max().reset_index()

但是对于我来说情况太慢了，因为我要处理的数据帧很大.另一方面，我无法用这种解决方案来说明两个元素都是数字的极端情况:

But it is too slow for my case since the data-frame that I am dealing with is huge. On the other hand, I am not able to cover the edge case where both of the elements are numbers with this solution:

start_df.groupby("id").sum(min_count=1).reset_index()

期待您的帮助！

pandas :自定义分组依据功能 [英] Pandas: Custom group-by function

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas :自定义分组依据功能 [英] Pandas: Custom group-by function

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭