大 pandas :填充组中的缺失值 [英] pandas: Filling missing values within a group
问题描述
我从一个实验中获得了一些数据,并且在每个试验中都有一些单个值(用NA
包围),我想填写整个试验:
I have some data from an experiment, and within each trial there are some single values, surrounded by NA
's, that I want to fill out to the entire trial:
df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'cs_name': [np.nan, 'A1', np.nan, np.nan, np.nan, np.nan, 'B2',
np.nan, 'A1', np.nan, np.nan, np.nan]})
Out[177]:
cs_name trial
0 NaN 1
1 A1 1
2 NaN 1
3 NaN 1
4 NaN 2
5 NaN 2
6 B2 2
7 NaN 2
8 A1 3
9 NaN 3
10 NaN 3
11 NaN 3
我可以同时使用bfill()
和ffill()
在整个试验中填充这些值,但是我想知道是否有更好的方法来实现这一目标.
I'm able to fill these values within the whole trial by using both bfill()
and ffill()
, but I'm wondering if there is a better way to achieve this.
df['cs_name'] = df.groupby('trial')['cs_name'].ffill()
df['cs_name'] = df.groupby('trial')['cs_name'].bfill()
预期输出:
cs_name trial
0 A1 1
1 A1 1
2 A1 1
3 A1 1
4 B2 2
5 B2 2
6 B2 2
7 B2 2
8 A1 3
9 A1 3
10 A1 3
11 A1 3
推荐答案
另一种方法是使用 transform
:
An alternative approach is to use first_valid_index
and a transform
:
In [11]: g = df.groupby('trial')
In [12]: g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
Out[12]:
0 A1
1 A1
2 A1
3 A1
4 B2
5 B2
6 B2
7 B2
8 A1
9 A1
10 A1
11 A1
Name: cs_name, dtype: object
这应该比先填充后再填充的效率更高.
并使用它来更改cs_name
列:
df['cs_name'] = g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
注意:我认为拥有一种方法来捕获熊猫中的第一个非空对象将是一个很好的增强,在numpy中,它是
Note: I think it would be nice enhancement to have a method to grab the first non-null object in the pandas, in numpy it's an open request, I don't think there is currently a method (I could be wrong!)...
这篇关于大 pandas :填充组中的缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!