pandas -合并缺失值 [英] pandas - merging with missing values

查看:85
本文介绍了 pandas -合并缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎有一个大熊猫合并功能的怪癖.它认为NaN值相等,并将NaN与其他NaN合并:

There appears to be a quirk with the pandas merge function. It considers NaN values to be equal, and will merge NaNs with other NaNs:

>>> foo = DataFrame([
    ['a',1,2],
    ['b',4,5],
    ['c',7,8],
    [np.NaN,10,11]
], columns=['id','x','y'])

>>> bar = DataFrame([
    ['a',3],
    ['c',9],
    [np.NaN,12]
], columns=['id','z'])

>>> pd.merge(foo, bar, how='left', on='id')
Out[428]: 
    id   x   y   z
0    a   1   2   3
1    b   4   5 NaN
2    c   7   8   9
3  NaN  10  11  12

[4 rows x 4 columns]

这与我见过的任何RDB不同,通​​常缺少的值将被不可知论对待,并且不会像它们相等一样被合并在一起.这对于数据稀疏的数据集尤其成问题(每个NaN都将合并到其他NaN中,从而导致庞大的DataFrame!)

This is unlike any RDB I've seen, normally missing values are treated with agnosticism and won't be merged together as if they are equal. This is especially problematic for datasets with sparse data (every NaN will be merged to every other NaN, resulting in a huge DataFrame!)

是否有一种方法可以在合并过程中忽略缺失的值而无需先将其切出?

Is there a way to ignore missing values during a merge without first slicing them out?

推荐答案

您可以从bar(如果需要,甚至包括foo)中排除值,在合并过程中id为null.但是,由于它们被切成薄片,因此不确定它是您的追求.

You could exclude values from bar (and indeed foo if you wanted) where id is null during the merge. Not sure it's what you're after, though, as they are sliced out.

(我从您的左联接中假设您有兴趣保留所有foo,但只想合并bar匹配且不为null的部分.)

(I've assumed from your left join that you're interested in retaining all of foo, but only want to merge the parts of bar that match and are not null.)

foo.merge(bar[pd.notnull(bar.id)], how='left', on='id')

Out[11]: 
id   x   y   z
0    a   1   2   3
1    b   4   5 NaN
2    c   7   8   9
3  NaN  10  11 NaN

这篇关于 pandas -合并缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆