pandas - 与缺失值合并 [英] pandas - merging with missing values

查看:30
本文介绍了 pandas - 与缺失值合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

pandas 合并功能似乎有一个怪癖.它认为 NaN 值相等,并将 NaNs 与其他 NaNs 合并:

<预><代码>>>>foo = 数据帧([['a',1,2],['b',4,5],['c',7,8],[np.NaN,10,11]], columns=['id','x','y'])>>>条 = 数据帧([['a',3],['c',9],[np.NaN,12]], columns=['id','z'])>>>pd.merge(foo, bar, how='left', on='id')出[428]:id x y z0 1 2 31 b 4 5 NaN2 7 8 93 南 10 11 12[4 行 x 4 列]

这与我见过的任何 RDB 都不一样,通常缺失的值会被视为不可知论,并且不会被合并在一起,就好像它们是相等的一样.这对于具有稀疏数据的数据集尤其成问题(每个 NaN 都将合并到其他每个 NaN,从而产生一个巨大的 DataFrame!)

有没有办法在合并过程中忽略缺失值而不先将它们切掉?

解决方案

你可以从 bar 中排除值(如果你愿意,也可以排除 foo) where id 在合并期间为空.不过,不确定这是你想要的,因为它们被切掉了.

(我从你的左连接假设你有兴趣保留所有的 foo,但只想合并 bar 匹配的部分并且是不为空.)

foo.merge(bar[pd.notnull(bar.id)], how='left', on='id')出[11]:id x y z0 1 2 31 b 4 5 NaN2 7 8 93 南 10 11 南

There appears to be a quirk with the pandas merge function. It considers NaN values to be equal, and will merge NaNs with other NaNs:

>>> foo = DataFrame([
    ['a',1,2],
    ['b',4,5],
    ['c',7,8],
    [np.NaN,10,11]
], columns=['id','x','y'])

>>> bar = DataFrame([
    ['a',3],
    ['c',9],
    [np.NaN,12]
], columns=['id','z'])

>>> pd.merge(foo, bar, how='left', on='id')
Out[428]: 
    id   x   y   z
0    a   1   2   3
1    b   4   5 NaN
2    c   7   8   9
3  NaN  10  11  12

[4 rows x 4 columns]

This is unlike any RDB I've seen, normally missing values are treated with agnosticism and won't be merged together as if they are equal. This is especially problematic for datasets with sparse data (every NaN will be merged to every other NaN, resulting in a huge DataFrame!)

Is there a way to ignore missing values during a merge without first slicing them out?

解决方案

You could exclude values from bar (and indeed foo if you wanted) where id is null during the merge. Not sure it's what you're after, though, as they are sliced out.

(I've assumed from your left join that you're interested in retaining all of foo, but only want to merge the parts of bar that match and are not null.)

foo.merge(bar[pd.notnull(bar.id)], how='left', on='id')

Out[11]: 
id   x   y   z
0    a   1   2   3
1    b   4   5 NaN
2    c   7   8   9
3  NaN  10  11 NaN

这篇关于 pandas - 与缺失值合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆