如何在两个不同大小的numpy数组之间进行比较,并返回具有公共元素的索引列? [英] How to compare between two numpy arrays of different size and return the index column with common elements?
问题描述
出于明显的原因,我有两个不同大小的numpy数组,一个带有索引列以及x y z坐标,另一个仅包含坐标. (请忽略第一个序列号,我无法弄清楚格式.)第二个数组的编号较少.坐标,我需要第一个数组中这些坐标的索引(atomID).
For obvious reasons I have two numpy arrays of different size one with an index column along with x y z coordinates and the other just containing the coordinates. (please ignore the first serial no., I can't figure out the formatting.) The second array has less no. of coordinates and I need the indexes (atomID) of those coordinates from the first array.
Array1(带有索引列):
Array1 (with index column):
serialNo. moleculeID atomID x y z
- 1 1 2 0 7.7590151 7.2925348 12.5933323
- 2 1 2 0 7.123642 6.1970949 11.5622416
- 3 1 6 0 6.944543 7.0390449 12.0713224
- 4 1 2 0 8.8900348 11.5477333 13.5633965
- 5 1 2 0 7.857268 12.8062735 13.4357052
- 6 1 6 0 8.2124357 12.1004238 14.0486889
Array2(仅是坐标):
Array2 (just the coordinates):
x y z
- 7.7590151 7.2925348 12.5933323
- 7.123642 6.1970949 11.5622416
- 6.944543 7.0390449 12.0713224
- 8.8900348 11.5477333 13.5633965
带有索引列(atomID)的数组的索引分别为2、2、6、2、2和6.如何获得Array1和Array2中常见坐标的索引.我希望返回2 2 6 2作为列表,然后将其与第二个数组连接.有什么简单的想法吗?
The array with the index column (atomID) has the indexes as 2, 2, 6, 2, 2 and 6. How can I get the indexes for the coordinates that are common in Array1 and Array2. I expect to return 2 2 6 2 as a list and then concatenate it with the second array. Any easy ideas?
更新:
尝试使用以下代码,但似乎无法正常工作.
Tried using the following code, but it doesn't seem to be working.
import numpy as np
a = np.array([[4, 2.2, 5], [2, -6.3, 0], [3, 3.6, 8], [5, -9.8, 50]])
b = np.array([[2.2, 5], [-6.3, 0], [3.6, 8]])
print a
print b
for i in range(len(b)):
for j in range(len(a)):
if a[j,1]==b[i,0]:
x = np.insert(b, 0, a[i,0], axis=1) #(input array, position to insert, value to insert, axis)
#continue
else:
print 'not true'
print x
其输出以下内容:
not true
not true
not true
not true
not true
not true
not true
not true
not true
[[ 3. 2.2 5. ]
[ 3. -6.3 0. ]
[ 3. 3.6 8. ]]
但期望是:
[[ 4. 2.2 5. ]
[ 2. -6.3 0. ]
[ 3. 3.6 8. ]]
推荐答案
numpy_indexed 包(免责声明:我是其作者)包含以优雅,有效/矢量化的方式解决此类问题的功能:
The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems in an elegant and efficient/vectorized manner:
import numpy_indexed as npi
print(a[npi.contains(b, a[:, 1:])])
当前接受的答案令我震惊,因为后者的坐标不同.而且性能也应该大大提高.这种解决方案不仅是矢量化的,而且最坏情况下的性能是NlogN,与当前接受的答案的二次时间复杂度相反.
The currently accepted answer strikes me as being incorrect for points which differ in their latter coordinates. And performance should be much improved here as well; not only is this solution vectorized, but worst case performance is NlogN, as opposed to the quadratic time complexity of the currently accepted answer.
这篇关于如何在两个不同大小的numpy数组之间进行比较,并返回具有公共元素的索引列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!