NumPy等同于合并 [英] NumPy equivalent of merge

查看:186
本文介绍了NumPy等同于合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将某些东西从R过渡到Python,并对如何有效合并感到好奇.我在NumPy的concatenate上找到了一些东西(使用NumPy进行操作,所以我想坚持下去),但是它没有按预期工作.

I'm transitioning some stuff from R to Python and am curious about merging efficiently. I've found some stuff on concatenate in NumPy (using NumPy for operations, so I'd like to stick with it), but it doesn't work as expected.

获取两个数据集

d1 = np.array([['1a2', '0'], ['2dd', '0'], ['z83', '1'], ['fz3', '0']])

ID      Label
1a2     0
2dd     0
z83     1
fz3     0

d2 = np.array([['1a2', '33.3', '22.2'], 
               ['43m', '66.6', '66.6'], 
               ['z83', '12.2', '22.1']])

ID     val1   val2
1a2    33.3   22.2
43m    66.6   66.6
z83    12.2   22.1

我想将它们合并在一起,以便获得结果

I want to merge these together so that the result is

d3

ID    Label    val1    val2
1a2   0        33.3    22.2
z83   1        12.2    22.1

因此,它标识了与ID列匹配的行,然后将它们串联在一起.这在使用merge的R中相对简单,但是在NumPy中对我来说不那么明显.

So it's identified rows that match on the ID column and then concatenated these together. This is relatively simple in R using merge, but in NumPy it's less obvious to me.

有什么办法可以让我在NumPy上原生地做到这一点吗?

Is there a way to do this natively in NumPy that I am missing?

推荐答案

这是使用遮罩的一种基于NumPy的解决方案-

Here's one NumPy based solution using masking -

def numpy_merge_bycol0(d1, d2):
    # Mask of matches in d1 against d2
    d1mask = np.isin(d1[:,0], d2[:,0])

    # Mask of matches in d2 against d1
    d2mask = np.isin(d2[:,0], d1[:,0])

    # Mask respective arrays and concatenate for final o/p
    return np.c_[d1[d1mask], d2[d2mask,1:]]

样品运行-

In [43]: d1
Out[43]: 
array([['1a2', '0'],
       ['2dd', '0'],
       ['z83', '1'],
       ['fz3', '0']], dtype='|S3')

In [44]: d2
Out[44]: 
array([['1a2', '33.3', '22.2'],
       ['43m', '66.6', '66.6'],
       ['z83', '12.2', '22.1']], dtype='|S4')

In [45]: numpy_merge_bycol0(d1, d2)
Out[45]: 
array([['1a2', '0', '33.3', '22.2'],
       ['z83', '1', '12.2', '22.1']], dtype='|S4')

我们也可以使用broadcasting来获取索引,然后使用整数索引代替掩码,就像这样-

We could also use broadcasting to get the indices and then integer-indexing in place of masking, like so -

idx = np.argwhere(d1[:,0,None] == d2[:,0])
out = np.c_[d1[idx[:,0]], d2[idx[:,0,1:]

这篇关于NumPy等同于合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆