Numpy:如何在给定索引的情况下以有效的方式摆脱沿轴 = 1 的最小值? [英] Numpy: How to get rid of the minima along axis=1, given the indices - in an efficient way?

查看:54
本文介绍了Numpy:如何在给定索引的情况下以有效的方式摆脱沿轴 = 1 的最小值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个形状为 (1000000,6) 的矩阵 A,我想出了如何获得每一行的最小最右边值并在这个函数中实现它:

Given a matrix A with shape (1000000,6) I have figured out how to get the minimum rightmost value for each row and implemented it in this function:

def calculate_row_minima_indices(h): # h is the given matrix.
    """Returns the indices of the rightmost minimum per row for matrix h."""
    flipped = numpy.fliplr(h) # flip the matrix to get the rightmost minimum.
    flipped_indices = numpy.argmin(flipped, axis=1)
    indices = numpy.array([2]*dim) - flipped_indices
    return indices

indices = calculate_row_minima_indices(h)
for col, row in enumerate(indices):
    print col, row, h[col][row] # col_index, row_index and value of minimum which should be removed.

每一行都有一个最小值.所以我需要知道的是删除最小的条目缩小具有形状(1000000,6)的矩阵strong> 到具有 形状 (1000000,5) 的矩阵.

Each row has a minimum. So what I need know is to remove the entry with the minimum and shrink the Matrix with shape (1000000,6) to a matrix with shape (1000000,5).

我会生成一个低维的新矩阵,并用我希望它使用 for 循环携带的值填充它,但我害怕运行时.那么是否有一些内置的方法或一些技巧可以通过每行的最小值来缩小矩阵?

I would generate a new matrix with lower dimension and populate it with the values I want it to carry using a for loop, but I am afraid of the runtime. So is there some builtin way or some trick to shrink the matrix by the minima per row?

也许此信息有用:所有值都大于或等于 0.0.

Perhaps this information is of use: The values are all greater or equal to 0.0.

推荐答案

假设您有足够的内存来保存原始数组和新数组的形状的布尔掩码,这里有一种方法:

Assuming you have enough memory to hold a boolean mask the shape of your original array as well as the new array, here's one way to do it:

import numpy as np

def main():
    np.random.seed(1) # For reproducibility
    data = generate_data((10, 6))

    indices = rightmost_min_col(data)
    new_data = pop_col(data, indices)

    print 'Original data...'
    print data
    print 'Modified data...'
    print new_data

def generate_data(shape):
    return np.random.randint(0, 10, shape)

def rightmost_min_col(data):
    nrows, ncols = data.shape[:2]
    min_indices = np.fliplr(data).argmin(axis=1)
    min_indices = (ncols - 1) - min_indices
    return min_indices

def pop_col(data, col_indices):
    nrows, ncols = data.shape[:2]
    col_indices = col_indices[:, np.newaxis]
    row_indices = np.arange(ncols)[np.newaxis, :]
    mask = col_indices != row_indices
    return data[mask].reshape((nrows, ncols-1))

if __name__ == '__main__':
    main()

这产生:

Original data...
[[5 8 9 5 0 0]
 [1 7 6 9 2 4]
 [5 2 4 2 4 7]
 [7 9 1 7 0 6]
 [9 9 7 6 9 1]
 [0 1 8 8 3 9]
 [8 7 3 6 5 1]
 [9 3 4 8 1 4]
 [0 3 9 2 0 4]
 [9 2 7 7 9 8]]
Modified data...
[[5 8 9 5 0]
 [7 6 9 2 4]
 [5 2 4 4 7]
 [7 9 1 7 6]
 [9 9 7 6 9]
 [1 8 8 3 9]
 [8 7 3 6 5]
 [9 3 4 8 4]
 [0 3 9 2 4]
 [9 7 7 9 8]]

我在这里使用的一个不太可读的技巧是在数组比较期间利用 numpy 的广播.作为一个简单的例子,请考虑以下内容:

One of the less readable tricks I'm using here is exploiting numpy's broadcasting during array comparisons. As a quick example, consider the following:

import numpy as np
a = np.array([[1, 2, 3]])
b = np.array([[1],[2],[3]])
print a == b

这产生:

array([[ True, False, False],
       [False,  True, False],
       [False, False,  True]], dtype=bool)

因此,如果我们知道要删除的项目的列索引,我们可以对列索引数组的操作进行向量化,这就是 pop_col 所做的.

So, if we know the column index of the item we want removed, we can vectorize the operation for an array of column indices, which is what pop_col does.

这篇关于Numpy:如何在给定索引的情况下以有效的方式摆脱沿轴 = 1 的最小值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆