Python pandas 唯一值忽略NaN [英] Python pandas unique value ignoring NaN

查看:44
本文介绍了Python pandas 唯一值忽略NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 groupby 聚合中使用 unique,但我不想在 unique 中使用 nan结果.

示例数据框:

df = pd.DataFrame({'a': [1, 2, 1, 1, pd.np.nan, 3, 3], 'b': [0,0,1,1,1,1,1],'c': ['foo', pd.np.nan, 'bar', 'foo', 'baz', 'foo', 'bar']})a b c0 1.0000 0 富1 2.0000 0 南2 1.0000 1 巴3 1.0000 1 英尺4 南 1 巴兹5 3.0000 1 英尺6 3.0000 1 巴

还有groupby:

df.groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', '独特的']})

结果是:

 a cmin max unique first last unique乙0 1.0000 2.0000 [1.0, 2.0] foo foo [foo, nan]1 1.0000 3.0000 [1.0, nan, 3.0] bar bar [bar, foo, baz]

但我想要没有nan:

 a cmin max unique first last unique乙0 1.0000 2.0000 [1.0, 2.0] foo foo [foo]1 1.0000 3.0000 [1.0, 3.0] bar bar [bar, foo, baz]

我该怎么做?当然,我有几列要聚合,每一列都需要不同的聚合函数,所以我不想逐一进行unique聚合,并与其他聚合分开.

谢谢!

解决方案

2020 年 11 月 23 日更新

这个答案很糟糕,不要使用这个.请参考@IanS 的回答.

之前

试试ffill

df.ffill().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', '最后', '独特的']})

<前>一first last unique min max unique乙0 foo foo [foo] 1.0 2.0 [1.0, 2.0]1 bar bar [bar, foo, baz] 1.0 3.0 [1.0, 3.0]

如果 Nan 是组的第一个元素,则上述解决方案中断.

I want to use unique in groupby aggregation, but I don't want nan in the unique result.

An example dataframe:

df = pd.DataFrame({'a': [1, 2, 1, 1, pd.np.nan, 3, 3], 'b': [0,0,1,1,1,1,1],
    'c': ['foo', pd.np.nan, 'bar', 'foo', 'baz', 'foo', 'bar']})

       a  b    c
0 1.0000  0  foo
1 2.0000  0  NaN
2 1.0000  1  bar
3 1.0000  1  foo
4    nan  1  baz
5 3.0000  1  foo
6 3.0000  1  bar

And the groupby:

df.groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

It's result is:

       a                             c                      
     min    max           unique first last           unique
b                                                           
0 1.0000 2.0000       [1.0, 2.0]   foo  foo       [foo, nan]
1 1.0000 3.0000  [1.0, nan, 3.0]   bar  bar  [bar, foo, baz]

But I want it without nan:

       a                        c                      
     min    max      unique first last           unique
b                                                           
0 1.0000 2.0000  [1.0, 2.0]   foo  foo            [foo]
1 1.0000 3.0000  [1.0, 3.0]   bar  bar  [bar, foo, baz]

How can I do that? Of course I have several columns to aggregate and every column needs different aggregation functions, so I don't want to do the unique aggregations one-by-one and separately from other aggregations.

Thank you!

解决方案

Update 23 November 2020

This answer is terrible, don't use this. Please refer @IanS's answer.

Earlier

Try ffill

df.ffill().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

      c                          a                 
  first last           unique  min  max      unique
b                                                  
0   foo  foo            [foo]  1.0  2.0  [1.0, 2.0]
1   bar  bar  [bar, foo, baz]  1.0  3.0  [1.0, 3.0]

If Nan is the first element of the group then the above solution breaks.

这篇关于Python pandas 唯一值忽略NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆