等价于numpy(Python)中的Matlab'ismember'吗? [英] Equivalent of Matlab 'ismember' in numpy (Python)?

查看:298
本文介绍了等价于numpy(Python)中的Matlab'ismember'吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力使用不幸的是,这段代码通常是我的Matlab脚本花费的大部分时间,所以我想找到一个高效的Numpy等效项.

Unfortunately this code tends to be where most of the time is spent in my Matlab scripts so I want to find an efficient Numpy equivalent.

基本模式包括将子集映射到更大的网格.我有一组作为并行数组存储的键值对,我想将这些值插入以相同方式存储的更大的键值对列表中.

The basic pattern consists of mapping a subset onto a larger grid. I have a set of key value pairs stored as parallel arrays and I want to insert these values into a larger list of key value pairs stored in the same way.

具体来说,我有按季度映射到每月时间网格的季度GDP数据,如下所示.

For concreteness say I have quarterly GDP data that I map onto a monthly time grid as follows.

quarters = [200712 200803 200806 200809 200812 200903];
gdp_q = [10.1 10.5 11.1 11.8 10.9 10.3];
months = 200801 : 200812;
gdp_m = NaN(size(months));
[tf, loc] = ismember(quarters, months);
gdp_m(loc(tf)) = gdp_q(tf);

请注意,并非所有季度都出现在月份列表中,因此 tf loc 变量都是必需的.

Note that not all the quarters appear in the list of months so both the tf and the loc variables are required.

我在StackOverflow上看到了类似的问题,但它们要么只是提供了一个纯Python解决方案(此处).

I have seen similar questions on StackOverflow but they either just give a pure Python solution (here) or where numpy is used then the loc argument isn't returned (here).

在我的特定应用程序区域中,这种特定的代码模式往往会一遍又一遍地出现,并且会占用我函数的大部分CPU时间,因此,这里的有效解决方案对我来说确实至关重要.

In my particular application area, this particular code pattern tends to arise over and over again and uses up most of the CPU time of my functions so an efficient solution here is really crucial for me.

也欢迎提出评论或重新设计建议.

Comments or redesign suggestions are also welcome.

推荐答案

如果对月份进行了排序,请使用np.searchsorted.否则,进行排序,然后使用np.searchsorted:

If months is sorted, use np.searchsorted. Otherwise, sort and then use np.searchsorted:

import numpy as np
quarters = np.array([200712, 200803, 200806, 200809, 200812, 200903])
months = np.arange(200801, 200813)
loc = np.searchsorted(months, quarters)

np.searchsorted返回插入位置.如果您的数据有可能不在正确的范围内,那么您之后可能需要进行检查:

np.searchsorted returns the insertion position. If there is a possibility that your data is not even in the right range, you might want to have a check afterwards:

valid = (quarters <= months.max()) & (quarters >= months.min())
loc = loc[valid]

这是O(N log N)解决方案.如果就运行时间而言,这仍然是程序中的大问题,则可以使用散列方案在C(++)中执行此子例程,该散列方案应为O(N)(并避免某些恒定因素,当然).

This is a O(N log N) solution. If this is still a big deal in your programme in terms of run time, you might just do this one subroutine in C(++) using a hashing scheme, which would be O(N) (as well as avoiding some constant factors, of course).

这篇关于等价于numpy(Python)中的Matlab'ismember'吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆