如何选择numpy数组的索引的倒数? [英] How to select inverse of indexes of a numpy array?

查看:23
本文介绍了如何选择numpy数组的索引的倒数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一大组数据,我需要在其中比较来自该数组的一组样本与该数组的所有其他元素的距离.下面是我的数据集的一个非常简单的例子.

I have a large set of data in which I need to compare the distances of a set of samples from this array with all the other elements of the array. Below is a very simple example of my data set.

import numpy as np
import scipy.spatial.distance as sd

data = np.array(
    [[ 0.93825827,  0.26701143],
     [ 0.99121108,  0.35582816],
     [ 0.90154837,  0.86254049],
     [ 0.83149103,  0.42222948],
     [ 0.27309625,  0.38925281],
     [ 0.06510739,  0.58445673],
     [ 0.61469637,  0.05420098],
     [ 0.92685408,  0.62715114],
     [ 0.22587817,  0.56819403],
     [ 0.28400409,  0.21112043]]
)


sample_indexes = [1,2,3]

# I'd rather not make this
other_indexes = list(set(range(len(data))) - set(sample_indexes))

sample_data = data[sample_indexes]
other_data = data[other_indexes]

# compare them
dists = sd.cdist(sample_data, other_data)

有没有办法为不是样本索引的索引索引一个 numpy 数组?在上面的示例中,我创建了一个名为 other_indexes 的列表.由于各种原因(大数据集、线程、正在运行的系统上的内存量非常低等,我宁愿不必这样做).有没有办法做类似的事情..

Is there a way to index a numpy array for indexes that are NOT the sample indexes? In my above example I make a list called other_indexes. I'd rather not have to do this for various reasons (large data set, threading, a very VERY low amount of memory on the system this is running on etc. etc. etc.). Is there a way to do something like..

other_data = data[ indexes not in sample_indexes]

我读到 numpy 面具可以做到这一点,但我试过了...

I read that numpy masks can do this but I tried...

other_data = data[~sample_indexes]

这给了我一个错误.我必须创建一个面具吗?

And this gives me an error. Do I have to create a mask?

推荐答案

mask = np.ones(len(data), np.bool)
mask[sample_indexes] = 0
other_data = data[mask]

对于可能应该是单行语句的内容来说不是最优雅的,但它相当有效,而且内存开销也很小.

not the most elegant for what perhaps should be a single-line statement, but its fairly efficient, and the memory overhead is minimal too.

如果内存是您最关心的问题,np.delete 将避免创建掩码,并且花式索引无论如何都会创建一个副本.

If memory is your prime concern, np.delete would avoid the creation of the mask, and fancy-indexing creates a copy anyway.

再三考虑;np.delete 不会修改现有数组,因此它几乎正是您要查找的单行语句.

On second thought; np.delete does not modify the existing array, so its pretty much exactly the single line statement you are looking for.

这篇关于如何选择numpy数组的索引的倒数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆