忽略NaN的Python pandas 的独特价值 [英] Python pandas unique value ignoring NaN
本文介绍了忽略NaN的Python pandas 的独特价值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在groupby
聚合中使用unique
,但是我不想在unique
结果中使用nan
.
I want to use unique
in groupby
aggregation, but I don't want nan
in the unique
result.
示例数据框:
df = pd.DataFrame({'a': [1, 2, 1, 1, pd.np.nan, 3, 3], 'b': [0,0,1,1,1,1,1],
'c': ['foo', pd.np.nan, 'bar', 'foo', 'baz', 'foo', 'bar']})
a b c
0 1.0000 0 foo
1 2.0000 0 NaN
2 1.0000 1 bar
3 1.0000 1 foo
4 nan 1 baz
5 3.0000 1 foo
6 3.0000 1 bar
和groupby
:
df.groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})
其结果是:
a c
min max unique first last unique
b
0 1.0000 2.0000 [1.0, 2.0] foo foo [foo, nan]
1 1.0000 3.0000 [1.0, nan, 3.0] bar bar [bar, foo, baz]
但是我想要没有nan
的
a c
min max unique first last unique
b
0 1.0000 2.0000 [1.0, 2.0] foo foo [foo]
1 1.0000 3.0000 [1.0, 3.0] bar bar [bar, foo, baz]
我该怎么做?当然,我有几个要聚合的列,每个列都需要不同的聚合函数,所以我不想与其他聚合一一对应地进行unique
聚合.
How can I do that? Of course I have several columns to aggregate and every column needs different aggregation functions, so I don't want to do the unique
aggregations one-by-one and separately from other aggregations.
谢谢!
推荐答案
尝试ffill
df.ffill().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})
c a
first last unique min max unique
b
0 foo foo [foo] 1.0 2.0 [1.0, 2.0]
1 bar bar [bar, foo, baz] 1.0 3.0 [1.0, 3.0]
如果Nan是该组的第一个元素,则上述解决方案将失效.从长远来看,@IanS
的解决方案更好.
If Nan is the first element of the group then the above solution breaks. @IanS
's solution is better in the long run.
这篇关于忽略NaN的Python pandas 的独特价值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文