NumPy 相当于合并 [英] NumPy equivalent of merge

查看:43
本文介绍了NumPy 相当于合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将一些东西从 R 转换到 Python,并且对有效合并感到好奇.我在 NumPy 中发现了一些关于 concatenate 的东西(使用 NumPy 进行操作,所以我想坚持使用它),但它没有按预期工作.

I'm transitioning some stuff from R to Python and am curious about merging efficiently. I've found some stuff on concatenate in NumPy (using NumPy for operations, so I'd like to stick with it), but it doesn't work as expected.

取两个数据集

d1 = np.array([['1a2', '0'], ['2dd', '0'], ['z83', '1'], ['fz3', '0']])

ID      Label
1a2     0
2dd     0
z83     1
fz3     0

d2 = np.array([['1a2', '33.3', '22.2'], 
               ['43m', '66.6', '66.6'], 
               ['z83', '12.2', '22.1']])

ID     val1   val2
1a2    33.3   22.2
43m    66.6   66.6
z83    12.2   22.1

我想把这些合并在一起,这样结果是

I want to merge these together so that the result is

d3

ID    Label    val1    val2
1a2   0        33.3    22.2
z83   1        12.2    22.1

因此它识别出与 ID 列匹配的行,然后将它们连接在一起.这在 R 中使用 merge 相对简单,但在 NumPy 中对我来说不太明显.

So it's identified rows that match on the ID column and then concatenated these together. This is relatively simple in R using merge, but in NumPy it's less obvious to me.

有没有办法在我缺少的 NumPy 中本地执行此操作?

Is there a way to do this natively in NumPy that I am missing?

推荐答案

这是一个使用掩码的基于 NumPy 的解决方案 -

Here's one NumPy based solution using masking -

def numpy_merge_bycol0(d1, d2):
    # Mask of matches in d1 against d2
    d1mask = np.isin(d1[:,0], d2[:,0])

    # Mask of matches in d2 against d1
    d2mask = np.isin(d2[:,0], d1[:,0])

    # Mask respective arrays and concatenate for final o/p
    return np.c_[d1[d1mask], d2[d2mask,1:]]

样品运行 -

In [43]: d1
Out[43]: 
array([['1a2', '0'],
       ['2dd', '0'],
       ['z83', '1'],
       ['fz3', '0']], dtype='|S3')

In [44]: d2
Out[44]: 
array([['1a2', '33.3', '22.2'],
       ['43m', '66.6', '66.6'],
       ['z83', '12.2', '22.1']], dtype='|S4')

In [45]: numpy_merge_bycol0(d1, d2)
Out[45]: 
array([['1a2', '0', '33.3', '22.2'],
       ['z83', '1', '12.2', '22.1']], dtype='|S4')

我们也可以使用 broadcasting 来获取索引,然后用整数索引代替掩码,就像这样 -

We could also use broadcasting to get the indices and then integer-indexing in place of masking, like so -

idx = np.argwhere(d1[:,0,None] == d2[:,0])
out = np.c_[d1[idx[:,0]], d2[idx[:,0,1:]

这篇关于NumPy 相当于合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆