如何在两个不同大小的numpy数组之间进行比较,并返回具有公共元素的索引列? [英] How to compare between two numpy arrays of different size and return the index column with common elements?

查看:678
本文介绍了如何在两个不同大小的numpy数组之间进行比较,并返回具有公共元素的索引列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于明显的原因,我有两个不同大小的numpy数组,一个带有索引列以及x y z坐标,另一个仅包含坐标. (请忽略第一个序列号,我无法弄清楚格式.)第二个数组的编号较少.坐标,我需要第一个数组中这些坐标的索引(atomID).

For obvious reasons I have two numpy arrays of different size one with an index column along with x y z coordinates and the other just containing the coordinates. (please ignore the first serial no., I can't figure out the formatting.) The second array has less no. of coordinates and I need the indexes (atomID) of those coordinates from the first array.

Array1(带有索引列):

Array1 (with index column):

    serialNo. moleculeID atomID x y z

  1. 1 1 2 0 7.7590151 7.2925348 12.5933323
  2. 2 1 2 0 7.123642 6.1970949 11.5622416
  3. 3 1 6 0 6.944543 7.0390449 12.0713224
  4. 4 1 2 0 8.8900348 11.5477333 13.5633965
  5. 5 1 2 0 7.857268 12.8062735 13.4357052
  6. 6 1 6 0 8.2124357 12.1004238 14.0486889

Array2(仅是坐标):

Array2 (just the coordinates):

x          y             z

  1. 7.7590151 7.2925348 12.5933323
  2. 7.123642 6.1970949 11.5622416
  3. 6.944543 7.0390449 12.0713224
  4. 8.8900348 11.5477333 13.5633965

带有索引列(atomID)的数组的索引分别为2、2、6、2、2和6.如何获得Array1和Array2中常见坐标的索引.我希望返回2 2 6 2作为列表,然后将其与第二个数组连接.有什么简单的想法吗?

The array with the index column (atomID) has the indexes as 2, 2, 6, 2, 2 and 6. How can I get the indexes for the coordinates that are common in Array1 and Array2. I expect to return 2 2 6 2 as a list and then concatenate it with the second array. Any easy ideas?

更新:

尝试使用以下代码,但似乎无法正常工作.

Tried using the following code, but it doesn't seem to be working.

import numpy as np

a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])

b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])

print a
print b

for i in range(len(b)):
 for j in range(len(a)):
    if a[j,1]==b[i,0]:
        x = np.insert(b, 0, a[i,0], axis=1) #(input array, position to insert, value to insert, axis)
        #continue
    else:
        print 'not true'
print x 

其输出以下内容:

not true
not true
not true
not true
not true
not true
not true
not true
not true
[[ 3.   2.2  5. ]
 [ 3.  -6.3  0. ]
 [ 3.   3.6  8. ]]

但期望是:

    [[ 4.   2.2  5. ]
     [ 2.  -6.3  0. ]
     [ 3.   3.6  8. ]]

推荐答案

numpy_indexed 包(免责声明:我是其作者)包含以优雅,有效/矢量化的方式解决此类问题的功能:

The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems in an elegant and efficient/vectorized manner:

import numpy_indexed as npi
print(a[npi.contains(b, a[:, 1:])])

当前接受的答案令我震惊,因为后者的坐标不同.而且性能也应该大大提高.这种解决方案不仅是矢量化的,而且最坏情况下的性能是NlogN,与当前接受的答案的二次时间复杂度相反.

The currently accepted answer strikes me as being incorrect for points which differ in their latter coordinates. And performance should be much improved here as well; not only is this solution vectorized, but worst case performance is NlogN, as opposed to the quadratic time complexity of the currently accepted answer.

这篇关于如何在两个不同大小的numpy数组之间进行比较,并返回具有公共元素的索引列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆