在 pandas 数据框中使用 NaN 条目折叠行 [英] Collapsing rows with NaN entries in pandas dataframe

查看:63
本文介绍了在 pandas 数据框中使用 NaN 条目折叠行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有数据行的 Pandas DataFrame::

I have a pandas DataFrame with rows of data::

# objectID        grade  OS     method
object_id_0001    AAA    Mac    organic
object_id_0001    AAA    Mac    NA
object_id_0001    AAA    NA     organic
object_id_0002    NA     NA     NA
object_id_0002    ABC    Win    NA

即同一个 objectID 通常有多个条目,但有时/通常这些条目有 NA.

i.e. there are often multiple entries for the same objectID but sometimes/often the entries have NAs.

因此,我只是在寻找一种可以结合 ObjectID 并报告非 NA 条目的方法,例如以上折叠为::

As such, I'm just looking for a way that would combine on ObjectID, and report the non-NA entries e.g. the above collapses down to::

object_id_0001    AAA    Mac    organic
object_id_0002    ABC    Win    NA

推荐答案

Quick and Dirty

这很有效,并且已经持续了很长时间.但是,有些人声称这是一个可以修复的错误.正如当前实现的那样,first 返回第一个非空元素(如果每列都存在).

Quick and Dirty

This works and has for a long time. However, some claim that this is a bug that may be fixed. As it is currently implemented, first returns the first non-null element if it exists per column.

df.groupby('objectID', as_index=False).first()

         objectID grade   OS   method
0  object_id_0001   AAA  Mac  organic
1  object_id_0002   ABC  Win      NaN

<小时>

pd.concat

pd.concat([
    pd.DataFrame([d.lookup(d.notna().idxmax(), d.columns)], columns=d.columns)
    for _, d in df.groupby('objectID')
], ignore_index=True)

         objectID grade   OS   method
0  object_id_0001   AAA  Mac  organic
1  object_id_0002   ABC  Win      NaN

<小时>

堆栈

df.set_index('objectID').stack().groupby(level=[0, 1]).head(1).unstack()

               grade   OS   method
objectID                          
object_id_0001   AAA  Mac  organic
object_id_0002   ABC  Win     None

<小时>

如果碰巧这些是字符串 ('NA')

df.mask(df.astype(str).eq('NA')).groupby('objectID', as_index=False).first()

这篇关于在 pandas 数据框中使用 NaN 条目折叠行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆