pandas -合并缺失值 [英] pandas - merging with missing values
问题描述
似乎有一个大熊猫合并功能的怪癖.它认为NaN
值相等,并将NaN
与其他NaN
合并:
There appears to be a quirk with the pandas merge function. It considers NaN
values to be equal, and will merge NaN
s with other NaN
s:
>>> foo = DataFrame([
['a',1,2],
['b',4,5],
['c',7,8],
[np.NaN,10,11]
], columns=['id','x','y'])
>>> bar = DataFrame([
['a',3],
['c',9],
[np.NaN,12]
], columns=['id','z'])
>>> pd.merge(foo, bar, how='left', on='id')
Out[428]:
id x y z
0 a 1 2 3
1 b 4 5 NaN
2 c 7 8 9
3 NaN 10 11 12
[4 rows x 4 columns]
这与我见过的任何RDB不同,通常缺少的值将被不可知论对待,并且不会像它们相等一样被合并在一起.这对于数据稀疏的数据集尤其成问题(每个NaN都将合并到其他NaN中,从而导致庞大的DataFrame!)
This is unlike any RDB I've seen, normally missing values are treated with agnosticism and won't be merged together as if they are equal. This is especially problematic for datasets with sparse data (every NaN will be merged to every other NaN, resulting in a huge DataFrame!)
是否有一种方法可以在合并过程中忽略缺失的值而无需先将其切出?
Is there a way to ignore missing values during a merge without first slicing them out?
推荐答案
您可以从bar
(如果需要,甚至包括foo
)中排除值,在合并过程中id
为null.但是,由于它们被切成薄片,因此不确定它是您的追求.
You could exclude values from bar
(and indeed foo
if you wanted) where id
is null during the merge. Not sure it's what you're after, though, as they are sliced out.
(我从您的左联接中假设您有兴趣保留所有foo
,但只想合并bar
匹配且不为null的部分.)
(I've assumed from your left join that you're interested in retaining all of foo
, but only want to merge the parts of bar
that match and are not null.)
foo.merge(bar[pd.notnull(bar.id)], how='left', on='id')
Out[11]:
id x y z
0 a 1 2 3
1 b 4 5 NaN
2 c 7 8 9
3 NaN 10 11 NaN
这篇关于 pandas -合并缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!