pandas - 与缺失值合并 [英] pandas - merging with missing values
问题描述
pandas 合并功能似乎有一个怪癖.它认为 NaN
值相等,并将 NaN
s 与其他 NaN
s 合并:
这与我见过的任何 RDB 都不一样,通常缺失的值会被视为不可知论,并且不会被合并在一起,就好像它们是相等的一样.这对于具有稀疏数据的数据集尤其成问题(每个 NaN 都将合并到其他每个 NaN,从而产生一个巨大的 DataFrame!)
有没有办法在合并过程中忽略缺失值而不先将它们切掉?
你可以从 bar
中排除值(如果你愿意,也可以排除 foo
) where id
在合并期间为空.不过,不确定这是你想要的,因为它们被切掉了.
(我从你的左连接假设你有兴趣保留所有的 foo
,但只想合并 bar
匹配的部分并且是不为空.)
foo.merge(bar[pd.notnull(bar.id)], how='left', on='id')出[11]:id x y z0 1 2 31 b 4 5 NaN2 7 8 93 南 10 11 南
There appears to be a quirk with the pandas merge function. It considers NaN
values to be equal, and will merge NaN
s with other NaN
s:
>>> foo = DataFrame([
['a',1,2],
['b',4,5],
['c',7,8],
[np.NaN,10,11]
], columns=['id','x','y'])
>>> bar = DataFrame([
['a',3],
['c',9],
[np.NaN,12]
], columns=['id','z'])
>>> pd.merge(foo, bar, how='left', on='id')
Out[428]:
id x y z
0 a 1 2 3
1 b 4 5 NaN
2 c 7 8 9
3 NaN 10 11 12
[4 rows x 4 columns]
This is unlike any RDB I've seen, normally missing values are treated with agnosticism and won't be merged together as if they are equal. This is especially problematic for datasets with sparse data (every NaN will be merged to every other NaN, resulting in a huge DataFrame!)
Is there a way to ignore missing values during a merge without first slicing them out?
You could exclude values from bar
(and indeed foo
if you wanted) where id
is null during the merge. Not sure it's what you're after, though, as they are sliced out.
(I've assumed from your left join that you're interested in retaining all of foo
, but only want to merge the parts of bar
that match and are not null.)
foo.merge(bar[pd.notnull(bar.id)], how='left', on='id')
Out[11]:
id x y z
0 a 1 2 3
1 b 4 5 NaN
2 c 7 8 9
3 NaN 10 11 NaN
这篇关于 pandas - 与缺失值合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!