pandas :自定义分组依据功能 [英] Pandas: Custom group-by function
问题描述
我正在寻找一个自定义的分组依据功能,该功能将对行进行分组,以便于:
I am looking for a custom group-by function that is going to group the rows in a way such that:
- 如果有任何数字和0,它将添加该数字.
- 如果有两个数字(它们始终都相同),则它将相加.
- 如果存在一个NaN和一个NaN,它将添加一个NaN.
- 如果有一个数字和一个NaN,它将添加该数字.
一个使事情更清楚的例子:
An example to make things more clear:
start_df = pd.DataFrame({"id": [1,1,2,2,3,3,4,4,4,5],
"foo": [4, 4, np.nan, 7, np.nan, np.nan, 0, 9, 9, 7],
"bar": [np.nan, np.nan, 0, 4, 0, 1, 6, 6, 0, 4]})
id foo bar
0 1 4.0 NaN
1 1 4.0 NaN
2 2 NaN 0.0
3 2 7.0 4.0
4 3 NaN 0.0
5 3 NaN 1.0
6 4 0.0 6.0
7 4 9.0 6.0
8 4 9.0 0.0
9 5 7.0 4.0
根据 id
进行自定义分组后:
After the custom group-by by id
:
result_df = pd.DataFrame({"id": [1,2,3,4,5], "foo": [4, 7, np.nan, 9, 7], "bar": [np.nan, 4, 1, 6, 4]})
id foo bar
0 1 4.0 NaN
1 2 7.0 4.0
2 3 NaN 1.0
3 4 9.0 6.0
4 5 7.0 4.0
我知道的一个解决方案是:
One solution that I am aware of is:
start_df.groupby("id").max().reset_index()
但是对于我来说情况太慢了,因为我要处理的数据帧很大.另一方面,我无法用这种解决方案来说明两个元素都是数字的极端情况:
But it is too slow for my case since the data-frame that I am dealing with is huge. On the other hand, I am not able to cover the edge case where both of the elements are numbers with this solution:
start_df.groupby("id").sum(min_count=1).reset_index()
期待您的帮助!
推荐答案
也许不是您想的那样,但这应该可以工作
Maybe not what you would have thought, but this should work
start_df.groupby('id').max()
如果要将"id"重新添加到列中,请使用 reset_index
.
Use reset_index
if you want to bring 'id' back into the columns.
这篇关于 pandas :自定义分组依据功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!