相当于 MATLAB 的“ismember"的 Python功能 [英] Python equivalent of MATLAB's "ismember" function

查看:34
本文介绍了相当于 MATLAB 的“ismember"的 Python功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过多次尝试优化代码后,似乎最后一个资源是尝试使用多个内核运行以下代码.我不知道如何转换/重新构建我的代码,以便它可以使用多核运行得更快.如果我能得到指导以实现最终目标,我将不胜感激.最终目标是能够尽可能快地为数组 A 和 B 运行此代码,其中每个数组包含大约 700,000 个元素.这是使用小数组的代码.700k 元素数组被注释掉了.

After many attempts trying optimize code, it seems that one last resource would be to attempt to run the code below using multiple cores. I don't know exactly how to convert/re-structure my code so that it can run much faster using multiple cores. I will appreciate if I could get guidance to achieve the end goal. The end goal is to be able to run this code as fast as possible for arrays A and B where each array holds about 700,000 elements. Here is the code using small arrays. The 700k element arrays are commented out.

import numpy as np

def ismember(a,b):
    for i in a:
        index = np.where(b==i)[0]
        if index.size == 0:
            yield 0
        else:
            yield index


def f(A, gen_obj):
    my_array = np.arange(len(A))
    for i in my_array:
        my_array[i] = gen_obj.next()
    return my_array


#A = np.arange(700000)
#B = np.arange(700000)
A = np.array([3,4,4,3,6])
B = np.array([2,5,2,6,3])

gen_obj = ismember(A,B)

f(A, gen_obj)

print 'done'
# if we print f(A, gen_obj) the output will be: [4 0 0 4 3]
# notice that the output array needs to be kept the same size as array A.

我想做的是模仿一个名为 ismember 的 MATLAB 函数[2](格式为:[Lia,Locb] = ismember(A,B).我只是想获取Locb仅部分.

What I am trying to do is to mimic a MATLAB function called ismember[2] (The one that is formatted as: [Lia,Locb] = ismember(A,B). I am just trying to get the Locb part only.

来自 Matlab:Locb,对于 A 中属于 B 的成员的每个值,包含 B 中的最低索引.输出数组 Locb 在 A 不是 B 的成员的地方包含 0

From Matlab: Locb, contain the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B

主要问题之一是我需要能够尽可能高效地执行此操作.为了测试,我有两个 700k 元素的数组.创建一个生成器并检查生成器的值似乎并不能快速完成工作.

One of the main problems is that I need to be able to perform this operation as efficient as possible. For testing I have two arrays of 700k elements. Creating a generator and going through the values of the generator doesn't seem to get the job done fast.

推荐答案

在担心多核之前,我会使用字典来消除你的 ismember 函数中的线性扫描:

Before worrying about multiple cores, I would eliminate the linear scan in your ismember function by using a dictionary:

def ismember(a, b):
    bind = {}
    for i, elt in enumerate(b):
        if elt not in bind:
            bind[elt] = i
    return [bind.get(itm, None) for itm in a]  # None can be replaced by any other "not in b" value

您的原始实现需要对 A 中的每个元素对 B 中的元素进行全面扫描,使其成为 O(len(A)*len(B)).上面的代码需要对 B 进行一次完整扫描才能生成 dict Bset.通过使用字典,您可以有效地为 A 的每个元素查找 B 中的每个元素,从而使操作 O(len(A)+len(B)).如果这样还是太慢,那就担心让上面的函数在多核上运行.

Your original implementation requires a full scan of the elements in B for each element in A, making it O(len(A)*len(B)). The above code requires one full scan of B to generate the dict Bset. By using a dict, you effectively make the lookup of each element in B constant for each element of A, making the operation O(len(A)+len(B)). If this is still too slow, then worry about making the above function run on multiple cores.

我还稍微修改了您的索引.Matlab 使用 0,因为它的所有数组都从索引 1 开始.Python/numpy 从 0 开始数组,所以如果你的数据集看起来像这样

I've also modified your indexing slightly. Matlab uses 0 because all of its arrays start at index 1. Python/numpy start arrays at 0, so if you're data set looks like this

A = [2378, 2378, 2378, 2378]
B = [2378, 2379]

如果没有元素返回 0,那么结果将排除 A 的所有元素.上述例程返回 None 没有索引而不是 0.返回 -1 是一个选项,但 Python 会将其解释为数组中的最后一个元素.如果 None 用作数组的索引,则会引发异常.如果您想要不同的行为,请将 Bind.get(item,None) 表达式中的第二个参数更改为您想要返回的值.

and you return 0 for no element, then your results will exclude all elements of A. The above routine returns None for no index instead of 0. Returning -1 is an option, but Python will interpret that to be the last element in the array. None will raise an exception if it's used as an index into the array. If you'd like different behavior, change the second argument in the Bind.get(item,None) expression to the value you want returned.

这篇关于相当于 MATLAB 的“ismember"的 Python功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆