如何有效地将一个数组中某个值在另一个数组中的位置处的出现求和 [英] How do you efficiently sum the occurences of a value in one array at positions in another array

查看:133
本文介绍了如何有效地将一个数组中某个值在另一个数组中的位置处的出现求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种高效的"for循环"避免解决方案,该解决方案可以解决我遇到的与数组相关的问题.我想使用一个巨大的1Darray (A -> size = 250.000)值在0和40之间的值在一个维度上建立索引,并使用一个大小相同的数组(B),其值在0和9995之间的值用于在第二维上建立索引.

结果应该是一个大小为(41,9996)的数组,其中每个索引的值是数组1的任何值出现在数组2的值上的次数.

示例:

A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
which should result in:
[[0, 1, 0,
 [0, 0, 0,
 [0, 0, 1,
 [0, 0, 2,
 [1, 0, 0]] 

肮脏的方式太慢了,因为数据量巨大,您将能够做的是:

 out = np.zeros(41,9995)
for i in A:
  for j in B:
     out[i,j] += 1 

 

这将需要238.000 * 238.000循环... 我已经尝试过了,但是部分有效:

 out = np.zeros(41,9995)
out[A,B] += 1

 

无论值出现多少次,哪一个结果在任何地方都为1.

有人知道如何解决此问题吗?预先感谢!

解决方案

您正在寻找稀疏张量:

 import torch

A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
idx = torch.LongTensor([A, B])
torch.sparse.FloatTensor(idx, torch.ones(idx.shape[1]), torch.Size([5,3])).to_dense()
 

输出:

tensor([[0., 1., 0.],
        [0., 0., 0.],
        [0., 0., 1.],
        [0., 0., 2.],
        [1., 0., 0.]])

您也可以使用 import numpy as np from scipy.sparse import coo_matrix coo_matrix((np.ones(len(A)), (np.array(A), np.array(B))), shape=(5,3)).toarray()

输出:

array([[0., 1., 0.],
       [0., 0., 0.],
       [0., 0., 1.],
       [0., 0., 2.],
       [1., 0., 0.]])

有时候,最好将矩阵保留为稀疏表示,而不是将其再次强制为密集".

Im looking for an efficient 'for loop' avoiding solution that solves an array related problem I'm having. I want to use a huge 1Darray (A -> size = 250.000) of values between 0 and 40 for indexing in one dimension, and a array (B) with the same size with values between 0 and 9995 for indexing in a second dimension.

The result should be an array with size (41, 9996) with for each index the amount of times that any value from array 1 occurs at a value from array 2.

Example:

A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
which should result in:
[[0, 1, 0,
 [0, 0, 0,
 [0, 0, 1,
 [0, 0, 2,
 [1, 0, 0]] 

The dirty way is too slow as the amount of data is huge, what you would be able to do is:

out = np.zeros(41,9995)
for i in A:
  for j in B:
     out[i,j] += 1 

which will take 238.000 * 238.000 loops... I've tried this, which works partially:

out = np.zeros(41,9995)
out[A,B] += 1

Which generates a result with 1 everywhere, regardless of the amount of times the values occur.

Does anyone have a clue how to fix this? Thanks in advance!

解决方案

You are looking for a sparse tensor:

import torch

A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
idx = torch.LongTensor([A, B])
torch.sparse.FloatTensor(idx, torch.ones(idx.shape[1]), torch.Size([5,3])).to_dense()

Output:

tensor([[0., 1., 0.],
        [0., 0., 0.],
        [0., 0., 1.],
        [0., 0., 2.],
        [1., 0., 0.]])

You can also do the same with scipy sparse matrix:

import numpy as np
from scipy.sparse import coo_matrix

coo_matrix((np.ones(len(A)), (np.array(A), np.array(B))), shape=(5,3)).toarray()

output:

array([[0., 1., 0.],
       [0., 0., 0.],
       [0., 0., 1.],
       [0., 0., 2.],
       [1., 0., 0.]])

Sometimes it is better to leave the matrix in its sparse representation, rather than forcing it to be "dense" again.

这篇关于如何有效地将一个数组中某个值在另一个数组中的位置处的出现求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆