Combine_first和fillna有什么区别? [英] What is the difference between combine_first and fillna?
问题描述
这两个功能似乎与我等效.您可以在下面的代码中看到它们实现了相同的目标,因为列c和d相等.那我什么时候应该在另一个上使用呢?
These two functions seem equivalent to me. You can see that they accomplish the same goal in the code below, as columns c and d are equal. So when should I use one over the other?
这里是一个例子:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=list('ab'))
df.loc[::2, 'a'] = np.nan
返回:
a b
0 NaN 4
1 2.0 6
2 NaN 8
3 0.0 4
4 NaN 4
5 0.0 8
6 NaN 7
7 2.0 2
8 NaN 9
9 7.0 2
这是我的出发点.现在,我将添加两列,一列使用Combine_first,一列使用fillna,它们将产生相同的结果:
This is my starting point. Now I will add two columns, one using combine_first and one using fillna, and they will produce the same result:
df['c'] = df.a.combine_first(df.b)
df['d'] = df['a'].fillna(df['b'])
返回:
a b c d
0 NaN 4 4.0 4.0
1 8.0 7 8.0 8.0
2 NaN 2 2.0 2.0
3 3.0 0 3.0 3.0
4 NaN 0 0.0 0.0
5 2.0 4 2.0 2.0
6 NaN 0 0.0 0.0
7 2.0 6 2.0 2.0
8 NaN 4 4.0 4.0
9 4.0 6 4.0 4.0
为此数据集提供以下信息:组合熊猫数据框列值插入新列
Credit to this question for the data set: Combine Pandas data frame column values into new column
推荐答案
combine_first
用于存在不重叠索引的情况.它将有效地填充空值以及第一个不存在的索引和列的提供值.
combine_first
is intended to be used when there is exists non-overlapping indices. It will effectively fill in nulls as well as supply values for indices and columns that didn't exist in the first.
dfa = pd.DataFrame([[1, 2, 3], [4, np.nan, 5]], ['a', 'b'], ['w', 'x', 'y'])
w x y
a 1.0 2.0 3.0
b 4.0 NaN 5.0
dfb = pd.DataFrame([[1, 2, 3], [3, 4, 5]], ['b', 'c'], ['x', 'y', 'z'])
x y z
b 1.0 2.0 3.0
c 3.0 4.0 5.0
dfa.combine_first(dfb)
w x y z
a 1.0 2.0 3.0 NaN
b 4.0 1.0 5.0 3.0 # 1.0 filled from `dfb`; 5.0 was in `dfa`; 3.0 new column
c NaN 3.0 4.0 5.0 # whole new index
请注意,所有索引和列都包含在结果中
Notice that all indices and columns are included in the results
现在,如果我们fillna
dfa.fillna(dfb)
w x y
a 1 2.0 3
b 4 1.0 5 # 1.0 filled in from `dfb`
请注意,不包括dfb
中的新列或索引.我们只在dfa
共享索引和列信息的地方填充空值.
Notice no new columns or indices from dfb
are included. We only filled in the null value where dfa
shared index and column information.
在您的情况下,在具有相同索引的一列上使用fillna
和combine_first
.这些实际上转化为同一件事.
In your case, you use fillna
and combine_first
on one column with the same index. These translate to effectively the same thing.
这篇关于Combine_first和fillna有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!