在多列上使用pandas fillna() [英] Using pandas fillna() on multiple columns
问题描述
我是熊猫的新用户(截至昨天),有时发现它既方便又令人沮丧.
I'm a new pandas user (as of yesterday), and have found it at times both convenient and frustrating.
我目前的沮丧之处在于试图在数据框的多列上使用df.fillna().例如,我有两组数据(部分重叠)(一组新的和一组旧的).对于我们拥有新数据的情况,我只是使用它,但是如果没有任何新数据,我也想使用旧数据.看来我应该能够使用fillna()用较旧的列来填充较新的列,但是我很难使它正常工作.
My current frustration is in trying to use df.fillna() on multiple columns of a dataframe. For example, I've got two sets of data (a newer set and an older set) which partially overlap. For the cases where we have new data, I just use that, but I also want to use the older data if there isn't anything newer. It seems I should be able to use fillna() to fill the newer columns with the older ones, but I'm having trouble getting that to work.
尝试一个具体示例:
df.ix[:,['newcolumn1','newcolumn2']].fillna(df.ix[:,['oldcolumn1','oldcolumn2']], inplace=True)
但这不能按预期方式工作-数字显示在以前是NaN的新列中,而不显示在旧列中(实际上,通过数据查看,我不知道这些数字在哪里它的来源是因为它们在任何地方的新数据或旧数据中都不存在.
But this doesn't work as expected - numbers show up in the new columns that had been NaNs, but not the ones that were in the old columns (in fact, looking through the data, I have no idea where the numbers it picked came from, as they don't exist in either the new or old data anywhere).
是否有一种方法可以使用来自DataFrame其他特定列的值来填充DataFrame中特定列的NaN?
Is there a way to fill in NaNs of specific columns in a DataFrame with vales from other specific columns of the DataFrame?
推荐答案
要回答您的问题:是的.查看使用fillna的value
参数.连同其他数据帧上的to_dict()
方法.
To answer your question: yes. Look at using the value
argument of fillna. Along with the to_dict()
method on the other dataframe.
But to really solve your problem, have a look at the update()
method of the DataFrame. Assuming your two dataframes are similarly indexed, I think it's exactly what you want.
In [36]: df = pd.DataFrame({'A': [0, np.nan, 2, 3, np.nan, 5], 'B': [1, 0, 1, np.nan, np.nan, 1]})
In [37]: df
Out[37]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 NaN
4 NaN NaN
5 5 1
In [38]: df2 = pd.DataFrame({'A': [0, np.nan, 2, 3, 4, 5], 'B': [1, 0, 1, 1, 0, 0]})
In [40]: df2
Out[40]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 1
4 4 0
5 5 0
In [52]: df.update(df2, overwrite=False)
In [53]: df
Out[53]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 1
4 4 0
5 5 1
请注意,除了(1, A)
外,所有df
中的NaN
都已替换,因为df2
中的NaN
也是如此.同样,某些值(例如(5, B)
)在df
和df2
之间有所不同.通过使用overwrite=False
,它可以保留df
中的值.
Notice that all the NaN
s in df
were replaced except for (1, A)
since that was also NaN
in df2
. Also some of the values like (5, B)
differed between df
and df2
. By using overwrite=False
it keeps the value from df
.
基于评论,您似乎正在寻找一种解决方案,其中列名称在两个DataFrame上不匹配(如果发布示例数据,这将很有帮助).让我们尝试一下,用C替换A列,用D替换B列.
Based on comments it seems like your looking for a solution where the column names don't match over the two DataFrames (It'd be helpful if you posted sample data). Let's try that, replacing column A with C and B with D.
In [33]: df = pd.DataFrame({'A': [0, np.nan, 2, 3, np.nan, 5], 'B': [1, 0, 1, np.nan, np.nan, 1]})
In [34]: df2 = pd.DataFrame({'C': [0, np.nan, 2, 3, 4, 5], 'D': [1, 0, 1, 1, 0, 0]})
In [35]: df
Out[35]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 NaN
4 NaN NaN
5 5 1
In [36]: df2
Out[36]:
C D
0 0 1
1 NaN 0
2 2 1
3 3 1
4 4 0
5 5 0
In [37]: d = {'A': df2.C, 'B': df2.D} # pass this values in fillna
In [38]: df
Out[38]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 NaN
4 NaN NaN
5 5 1
In [40]: df.fillna(value=d)
Out[40]:
A B
0 0 1
1 NaN 0
2 2 1
3 3 1
4 4 0
5 5 1
我认为,如果您花时间学习大熊猫,您遇到挫折的机会会更少.虽然这是一个庞大的图书馆,所以需要时间.
I think if you invest the time to learn pandas you'll hit fewer moments of frustration. It's a massive library though, so it takes time.
这篇关于在多列上使用pandas fillna()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!