如何仅从numpy数组的每一行中获取第一个True值? [英] How to obtain only the first True value from each row of a numpy array?

查看:35
本文介绍了如何仅从numpy数组的每一行中获取第一个True值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个4x3布尔numpy数组,并且我试图返回一个大小均为False的相同大小的数组,除了第一个True值在原始行的每一行中的位置以外.因此,如果我有一个

的起始数组

  all_bools = np.array([[False,True,True],[True,True,True],[False,False,True],[False,False,False]])all_boolsarray([[False,True,True],#第一个真值=索引1[True,True,True],#第一个真实值=索引0[False,False,True],#第一个真实值=索引2[False,False,False]])没有正确的值 

那我想回来

  [[False,True,False],[对,错,错],[False,False,True],[错误,错误,错误] 

因此,前三行的索引1、0和2已设置为True,除此之外没有其他设置.基本上,原始方式中的任何True值(超过每行的第一个值)都已设置为False.

我一直在用np.where和np.argmax来解决这个问题,但是我还没有找到一个好的解决方案-非常感谢您的帮助.这需要运行很多次,所以我想避免重复.

解决方案

您可以使用

这种趋势会导致更大的N. cumsum 非常昂贵,而我的第二个解决方案与@a_guest的解决方案之间存在恒定的时间差.

I have a 4x3 boolean numpy array, and I'm trying to return a same-sized array which is all False, except for the location of the first True value on each row of the original. So if I have a starting array of

all_bools = np.array([[False, True, True],[True, True, True],[False, False, True],[False,False,False]])
all_bools
array([[False,  True,  True], # First true value = index 1
       [ True,  True,  True], # First true value = index 0
       [False, False,  True], # First true value = index 2
       [False, False, False]]) # No True Values

then I'd like to return

[[False, True, False],
 [True, False, False],
 [False, False, True],
 [False, False, False]]

so indices 1, 0 and 2 on the first three rows have been set to True and nothing else. Essentially any True value (beyond the first on each row) from the original way have been set to False.

I've been fiddling around with this with np.where and np.argmax and I haven't yet found a good solution - any help gratefully received. This needs to run many, many times so I'd like to avoid iterating.

解决方案

You can use cumsum, and find the first bool by comparing the result with 1.

all_bools.cumsum(axis=1).cumsum(axis=1) == 1 
array([[False,  True, False],
       [ True, False, False],
       [False, False,  True],
       [False, False, False]])

This also accounts for the issue @a_guest pointed out. The second cumsum call is needed to avoid matching all False values between the first and second True value.


If performance is important, use argmax and set values:

y = np.zeros_like(all_bools, dtype=bool)
idx = np.arange(len(x)), x.argmax(axis=1)
y[idx] = x[idx]

y
array([[False,  True, False],
       [ True, False, False],
       [False, False,  True],
       [False, False, False]])


Perfplot Performance Timings
I'll take this opportunity to show off perfplot, with some timings, since it is good to see how our solutions vary with different sized inputs.

import numpy as np
import perfplot

def cs1(x):
    return  x.cumsum(axis=1).cumsum(axis=1) == 1 

def cs2(x):
    y = np.zeros_like(x, dtype=bool)
    idx = np.arange(len(x)), x.argmax(axis=1)
    y[idx] = x[idx]
    return y

def a_guest(x):
    b = np.zeros_like(x, dtype=bool)
    i = np.argmax(x, axis=1)
    b[np.arange(i.size), i] = np.logical_or.reduce(x, axis=1)
    return b

perfplot.show(
    setup=lambda n: np.random.randint(0, 2, size=(n, n)).astype(bool),
    kernels=[cs1, cs2, a_guest],
    labels=['cs1', 'cs2', 'a_guest'],
    n_range=[2**k for k in range(1, 8)],
    xlabel='N'
)

The trend carries forward to larger N. cumsum is very expensive, while there is a constant time difference between my second solution, and @a_guest's.

这篇关于如何仅从numpy数组的每一行中获取第一个True值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆