(pandas)为什么.bfill().ffill()的行为与ffill().bfill()在组上的行为不同? [英] (pandas) Why does .bfill().ffill() act differently than ffill().bfill() on groups?
问题描述
我认为我在概念上缺少一些基本的东西,但我无法在文档中找到答案。
>>> df = pd.DataFrame({'a':[1,1,2,2,3,3],'b':[5,np.nan,6,np.nan,np.nan,np.nan] })
>>> df
ab
0 1 5.0
1 1 NaN
2 2 6.0
3 2 NaN
4 3 NaN
5 3 NaN
使用ffill()然后bfill():
>>> df.groupby('a')['b'] .ffill().bfill()
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
使用bfill()和ffill():
>>> df.groupby('a')['b']。bfill().ffill()
0 5.0
1 5.0
2 6.0
3 6.0
4 6.0
5 6.0
第二种方式不会破坏分组吗?第一种方法是否总是确保这些值只与该组中的其他值一起填充?
我认为您需要:
print(df.groupby('a')['b']。apply(lambda x:x.ffill( ).bfill()))
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
名称:b ,dtype:float64
print(df.groupby('a')['b']。apply(lambda x:x.bfill()。ffill()))
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
名称:b,dtype:float64
$ c $因为在你的示例中只有第一个 ffill
或者 bfill code>是 DataFrameGroupBy.ffill
或 DataFrameGroupBy.bfill
,其次是输出系列
。所以它打破了团体,因为系列
没有团体。
print( df.groupby('a')['b'] .ffill())
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
名称:b,dtype:float64
print(df.groupby('a')['b']。bfill())
0 5.0
1 NaN
2 6.0
3 NaN
4 NaN
5 NaN
名称:b,dtype:float64
I think I'm missing something basic conceptually, but I'm not able to find the answer in the docs.
>>> df=pd.DataFrame({'a':[1,1,2,2,3,3], 'b':[5,np.nan, 6, np.nan, np.nan, np.nan]})
>>> df
a b
0 1 5.0
1 1 NaN
2 2 6.0
3 2 NaN
4 3 NaN
5 3 NaN
Using ffill() and then bfill():
>>> df.groupby('a')['b'].ffill().bfill()
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
Using bfill() and then ffill():
>>> df.groupby('a')['b'].bfill().ffill()
0 5.0
1 5.0
2 6.0
3 6.0
4 6.0
5 6.0
Doesn't the second way break the groupings? Will the first way always make sure that the values are filled in only with other values in that group?
解决方案 I think you need:
print (df.groupby('a')['b'].apply(lambda x: x.ffill().bfill()))
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
Name: b, dtype: float64
print (df.groupby('a')['b'].apply(lambda x: x.bfill().ffill()))
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
Name: b, dtype: float64
because in your sample only first ffill
or bfill
is DataFrameGroupBy.ffill
or DataFrameGroupBy.bfill
, second is working with output Series
. So it break groups, because Series
has no groups.
print (df.groupby('a')['b'].ffill())
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
Name: b, dtype: float64
print (df.groupby('a')['b'].bfill())
0 5.0
1 NaN
2 6.0
3 NaN
4 NaN
5 NaN
Name: b, dtype: float64
这篇关于(pandas)为什么.bfill().ffill()的行为与ffill().bfill()在组上的行为不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!