大 pandas :填充组中的缺失值 [英] pandas: Filling missing values within a group

查看:62
本文介绍了大 pandas :填充组中的缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从一个实验中获得了一些数据,并且在每个试验中都有一些单个值(用NA包围),我想填写整个试验:

I have some data from an experiment, and within each trial there are some single values, surrounded by NA's, that I want to fill out to the entire trial:

df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], 
    'cs_name': [np.nan, 'A1', np.nan, np.nan, np.nan, np.nan, 'B2', 
                np.nan, 'A1', np.nan, np.nan, np.nan]})
Out[177]: 
   cs_name  trial
0      NaN      1
1       A1      1
2      NaN      1
3      NaN      1
4      NaN      2
5      NaN      2
6       B2      2
7      NaN      2
8       A1      3
9      NaN      3
10     NaN      3
11     NaN      3

我可以同时使用bfill()ffill()在整个试验中填充这些值,但是我想知道是否有更好的方法来实现这一目标.

I'm able to fill these values within the whole trial by using both bfill() and ffill(), but I'm wondering if there is a better way to achieve this.

df['cs_name'] = df.groupby('trial')['cs_name'].ffill()
df['cs_name'] = df.groupby('trial')['cs_name'].bfill()

预期输出:

   cs_name  trial
0       A1      1
1       A1      1
2       A1      1
3       A1      1
4       B2      2
5       B2      2
6       B2      2
7       B2      2
8       A1      3
9       A1      3
10      A1      3
11      A1      3

推荐答案

另一种方法是使用

An alternative approach is to use first_valid_index and a transform:

In [11]: g = df.groupby('trial')

In [12]: g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
Out[12]: 
0     A1
1     A1
2     A1
3     A1
4     B2
5     B2
6     B2
7     B2
8     A1
9     A1
10    A1
11    A1
Name: cs_name, dtype: object

这应该比先填充后再填充的效率更高.

并使用它来更改cs_name列:

df['cs_name'] = g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])

注意:我认为拥有一种方法来捕获熊猫中的第一个非空对象将是一个很好的增强,在numpy中,它是

Note: I think it would be nice enhancement to have a method to grab the first non-null object in the pandas, in numpy it's an open request, I don't think there is currently a method (I could be wrong!)...

这篇关于大 pandas :填充组中的缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆