Python等同于MATLAB的"ismember"功能 [英] Python equivalent of MATLAB's "ismember" function

查看:587
本文介绍了Python等同于MATLAB的"ismember"功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过多次尝试优化代码之后,似乎最后的资源就是尝试使用多个内核来运行下面的代码.我不确切地知道如何转换/重组我的代码,以便它可以使用多个内核更快地运行.如果能得到最终目标的指导,我将不胜感激.最终目标是能够对数组A和B尽可能快地运行此代码,其中每个数组容纳大约700,000个元素.这是使用小数组的代码.注释掉了700k个元素数组.

After many attempts trying optimize code, it seems that one last resource would be to attempt to run the code below using multiple cores. I don't know exactly how to convert/re-structure my code so that it can run much faster using multiple cores. I will appreciate if I could get guidance to achieve the end goal. The end goal is to be able to run this code as fast as possible for arrays A and B where each array holds about 700,000 elements. Here is the code using small arrays. The 700k element arrays are commented out.

import numpy as np

def ismember(a,b):
    for i in a:
        index = np.where(b==i)[0]
        if index.size == 0:
            yield 0
        else:
            yield index


def f(A, gen_obj):
    my_array = np.arange(len(A))
    for i in my_array:
        my_array[i] = gen_obj.next()
    return my_array


#A = np.arange(700000)
#B = np.arange(700000)
A = np.array([3,4,4,3,6])
B = np.array([2,5,2,6,3])

gen_obj = ismember(A,B)

f(A, gen_obj)

print 'done'
# if we print f(A, gen_obj) the output will be: [4 0 0 4 3]
# notice that the output array needs to be kept the same size as array A.

我想做的是模仿称为 ismember的MATLAB函数 [2](格式为:[Lia,Locb] = ismember(A,B)的那个.我只是尝试仅获取Locb部分.

What I am trying to do is to mimic a MATLAB function called ismember[2] (The one that is formatted as: [Lia,Locb] = ismember(A,B). I am just trying to get the Locb part only.

在Matlab中:Locb对于A中属于B的每个值,在B中包含最低的索引.只要A不是B的成员,输出数组Locb都包含0.

From Matlab: Locb, contain the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B

主要问题之一是我需要能够尽可能高效地执行此操作.为了测试,我有两个700k元素的数组.创建一个生成器并检查生成器的值似乎并不能很快完成工作.

One of the main problems is that I need to be able to perform this operation as efficient as possible. For testing I have two arrays of 700k elements. Creating a generator and going through the values of the generator doesn't seem to get the job done fast.

推荐答案

在担心多个内核之前,我将通过使用字典来消除ismember函数中的线性扫描:

Before worrying about multiple cores, I would eliminate the linear scan in your ismember function by using a dictionary:

def ismember(a, b):
    bind = {}
    for i, elt in enumerate(b):
        if elt not in bind:
            bind[elt] = i
    return [bind.get(itm, None) for itm in a]  # None can be replaced by any other "not in b" value

您的原始实现要求对A中的每个元素都对B中的元素进行全面扫描,使其成为O(len(A)*len(B)).上面的代码需要对B进行一次完整扫描,以生成dict Bset.通过使用dict,您可以有效地使A中每个元素的B中每个元素的查找常量保持不变,从而使操作O(len(A)+len(B)).如果这仍然太慢,那么请担心使上述功能在多个内核上运行.

Your original implementation requires a full scan of the elements in B for each element in A, making it O(len(A)*len(B)). The above code requires one full scan of B to generate the dict Bset. By using a dict, you effectively make the lookup of each element in B constant for each element of A, making the operation O(len(A)+len(B)). If this is still too slow, then worry about making the above function run on multiple cores.

我还对您的索引进行了一些修改. Matlab使用0,因为它的所有数组都从索引1开始.Python/numpy从0开始的数组,所以如果您的数据集看起来像这样

I've also modified your indexing slightly. Matlab uses 0 because all of its arrays start at index 1. Python/numpy start arrays at 0, so if you're data set looks like this

A = [2378, 2378, 2378, 2378]
B = [2378, 2379]

,并且没有元素返回0,那么结果将排除A的所有元素.上面的例程返回没有索引而不是0的None.返回-1是一个选项,但是Python会将其解释为数组中的最后一个元素.如果None用作数组的索引,则会引发异常.如果您想要其他行为,请将Bind.get(item,None)表达式中的第二个参数更改为要返回的值.

and you return 0 for no element, then your results will exclude all elements of A. The above routine returns None for no index instead of 0. Returning -1 is an option, but Python will interpret that to be the last element in the array. None will raise an exception if it's used as an index into the array. If you'd like different behavior, change the second argument in the Bind.get(item,None) expression to the value you want returned.

这篇关于Python等同于MATLAB的"ismember"功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆