填充奇怪的行为,当有重复的列名时 [英] ffill weird behavior , when have the duplicate columns names
问题描述
我有一个如下所示的 DataFrame
df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])df.columns=['A','A']
现在我想 ffill
值 groupby
index
,首先我尝试
df.groupby(level=0).ffill()
返回错误代码
<代码>>ValueError:缓冲区的维数错误(预期为 1,得到 2)
它看起来像一个错误,然后我正在尝试应用,它返回预期的输出.
df.groupby(level=0).apply(lambda x : x.ffill())一个1 纳米 2.01 1.0 2.02 1.0 2.02 1.0 2.0
当列是唯一的时供您参考,它只是(Q2)很好,但是,创建一个索引列并且列名称是 NaN代码>
df.columns=['C','D']df.groupby(level=0).ffill()NaN C D1 1 NaN 2.01 1 1.0 2.02 2 1.0 2.02 2 1.0 2.0
<块引用>
问题:
1 这是一个错误吗?为什么 apply 仍然适用于这种类型的情况?
2 为什么 groupby
与 index
和 ffill
一起创建额外的列?
它确实看起来有问题.只是想注意,根据 pandas 文档 .ffill()
方法是 .fillna(method='ffill')
的同义词.使用后者为您在 Pandas 版本 0.23.4
中的两个示例生成预期输出,没有任何错误或附加列.希望有所帮助.
将pandas导入为pd将 numpy 导入为 npdf=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])df.columns=['A','A'] #dup 列名df.groupby(level=0).fillna(method='ffill')输出:一个1 纳米 2.01 1.0 2.02 1.0 2.02 1.0 2.0
I have a DataFrame as below
df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])
df.columns=['A','A']
Now I want to ffill
the values groupby
the index
, first I try
df.groupby(level=0).ffill()
Which returns the error code
> ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
It looks like a bug, then I am trying with apply, which returns the expected output.
df.groupby(level=0).apply(lambda x : x.ffill())
A A
1 NaN 2.0
1 1.0 2.0
2 1.0 2.0
2 1.0 2.0
For your reference when the columns is unique , it works just(Q2) fine, however, create one index columns and columns name is NaN
df.columns=['C','D']
df.groupby(level=0).ffill()
NaN C D
1 1 NaN 2.0
1 1 1.0 2.0
2 2 1.0 2.0
2 2 1.0 2.0
Question :
1 Is this a bug ? why apply can still work with this type situation ?2 why
groupby
withindex
andffill
, it creates the additional columns ?
It sure looks bugged. Just wanted to note that according to the pandas documentation the .ffill()
method is a synonym for .fillna(method='ffill')
. Using the latter generates your expected output for both your examples in pandas version 0.23.4
without any errors or additional columns. Hope that helps.
import pandas as pd
import numpy as np
df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])
df.columns=['A','A'] #dup column names
df.groupby(level=0).fillna(method='ffill')
Output:
A A
1 NaN 2.0
1 1.0 2.0
2 1.0 2.0
2 1.0 2.0
这篇关于填充奇怪的行为,当有重复的列名时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!