稀疏矩阵的numpy元素外部乘积 [英] numpy elementwise outer product with sparse matrices

查看：223 发布时间：2020/5/7 19:45:25 python numpy sparse-matrix matrix-multiplication elementwise-operations

本文介绍了稀疏矩阵的numpy元素外部乘积的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在python中将三个(或四个)大型2D数组进行元素逐个外部乘积(值将float32舍入为两位小数).它们都具有相同数量的行"n"，但是具有不同数量的列"i"，"j"，"k".
所得数组的形状应为(n，i * j * k).然后，我想对结果的每一列求和，最后得到一维形状为(i * j * k)的一维数组.

I want to do the element-wise outer product of three (or four) large 2D arrays in python (values are float32 rounded to 2 decimals). They all have the same number of rows "n", but different number of columns "i", "j", "k".
The resulting array should be of shape (n, i*j*k). Then, I want to sum each column of the result to end up with a 1D array of shape (i*j*k).

np.shape(a) = (75466, 10)
np.shape(b) = (75466, 28)
np.shape(c) = (75466, 66)

np.shape(intermediate_result) = (75466, 18480)
np.shape(result) = (18480)

由于 ruankesi和divakar ，我得到了一段有效的代码:

Thanks to ruankesi and divakar, I got a piece of code that works:

# Multiply first two matrices
first_multi = a[...,None] * b[:,None]
# could use np.einsum('ij,ik->ijk',a,b), which is slightly faster
ab_fills = first_multi.reshape(a.shape[0], a.shape[1]*b.shape[1])

# Multiply the result with the third matrix
second_multi = ab_fills[..., None] * c[:,None]
abc_fills = second_multi.reshape(ab_fills.shape[0], ab_fills.shape[1] * c.shape[1])

# Get the result: sum columns and get a 1D array of length 10*28*66 = 18 480
result = np.sum(abc_fills, axis = 0)

问题1 :性能

这大约需要3秒钟，但是我必须重复多次此操作，并且某些矩阵甚至更大(以行数计).这是可以接受的，但是使其更快将是不错的.

Problem 1: Performance

This takes about 3 seconds, but I have to repeat this operation many times and some of the matrices are even larger (in number of rows). It is acceptable but making it faster would be nice.

例如，"a"实际上包含0的70％.我尝试使用scipy csc_matrix，但实际上无法获得工作版本. (要在此处获得按元素分类的外部乘积，我需要转换为3D矩阵，而scipy sparse_matrix不支持该矩阵)

Indeed, for instance, "a" contains 70% of 0s. I tried to play with scipy csc_matrix, but really could not get a working version. (to get the element-wise outer product here I go via a conversion to a 3D matrix, which are not supported in scipy sparse_matrix)

如果我也尝试使用第4个矩阵，则会遇到内存问题.

我想将这段代码转换为sparse_matrix可以节省大量内存，并且可以通过忽略众多0值来加快计算速度. 真的吗?如果是，有人可以帮我吗?
当然，如果您有更好的实施建议，我也很感兴趣.我不需要任何中间结果，仅需要最终的一维结果.
我停留在这部分代码上已经有几个星期了，我真疯了！

谢谢！

If I try to also work with a 4th matrix, I run into memory issues.

I imagine that converting this code to sparse_matrix would save a lot of memory, and make the calculation faster by ignoring the numerous 0 values. Is that true? If yes, can someone help me?
Of course, if you have any suggestion for a better implementation, I am also very interested. I don't need any of the intermediate results, just the final 1D result.
It's been weeks I'm stuck on this part of code, I am going nuts!

Thank you!

方法1:
一种很好的衬板，但出人意料地比原始方法要慢(?).
在我的测试数据集上，方法#1每个循环耗时4.98 s±3.06 ms(optimize = True时无加速)
最初的分解方法每个循环耗时3.01 s±16.5 ms

Approach #1:
Very nice one liner but surprisingly slower than the original approach (?).
On my test dataset, approach #1 takes 4.98 s ± 3.06 ms per loop (no speedup with optimize = True)
The original decomposed approach took 3.01 s ± 16.5 ms per loop

方法2:
太好了，谢谢！多么令人印象深刻的加速！
每个循环62.6 ms±233 µs

Approach #2:
Absolutely great, thank you! What an impressive speedup!
62.6 ms ± 233 µs per loop

关于numexpr，我尽量避免对外部模块的需求，并且我不打算使用多核/线程.这是一个令人费解的"可并行化任务，需要分析成千上万个对象，我将在生产过程中将列表分散到所有可用的CPU上.我将尝试进行内存优化.
作为对numexpr的简短尝试，其中限制了1个线程，执行1个乘法，没有numexpr的运行时为40毫秒，而使用numexpr的运行时为52毫秒.
再次感谢！

About numexpr, I try to avoid as much as possible requirements for external modules, and I don't plan to use multicores/threads. This is an "embarrassingly" parallelizable task, with hundreds of thousands of objects to analyze, I'll just spread the list across available CPUs during production. I will give it a try for memory optimization.
As a brief try of numexpr with a restriction for 1 thread, performing 1 multiplication, I get a runtime of 40ms without numexpr, and 52 ms with numexpr.
Thanks again!!

推荐答案

方法1

我们可以使用 np.einsum 来一口气减少总和-

We can use np.einsum to do sum-reductions in one go -

result = np.einsum('ij,ik,il->jkl',a,b,c).ravel()

此外，将np.einsum中的optimize标志设置为True，以使用BLAS.

Also, play around with the optimize flag in np.einsum by setting it as True to use BLAS.

方法2

我们可以使用broadcasting进行第一步，正如发布的代码中所述，然后将tensor-matrix-multiplcation与np.tensordot-

We can use broadcasting to do the first step as also mentioned in the posted code and then leverage tensor-matrix-multiplcation with np.tensordot -

def broadcast_dot(a,b,c):
    first_multi = a[...,None] * b[:,None]
    return np.tensordot(first_multi,c, axes=(0,0)).ravel()

我们还可以使用 numexpr模块支持多核处理，并且获得first_multi的内存效率更高.这为我们提供了一种经过修改的解决方案，例如-

We can also use numexpr module that supports multi-core processing and also achieves better memory efficiency to get first_multi. This gives us a modified solution, like so -

import numexpr as ne

def numexpr_broadcast_dot(a,b,c):
    first_multi = ne.evaluate('A*B',{'A':a[...,None],'B':b[:,None]})
    return np.tensordot(first_multi,c, axes=(0,0)).ravel()

具有给定数据集大小的随机浮动数据的计时-

Timings on random float data with given dataset sizes -

In [36]: %timeit np.einsum('ij,ik,il->jkl',a,b,c).ravel()
4.57 s ± 75.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit broadcast_dot(a,b,c)
270 ms ± 103 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit numexpr_broadcast_dot(a,b,c)
172 ms ± 63.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

只是为了给numexpr提供改善感-

Just to give a sense of improvement with numexpr -

In [7]: %timeit a[...,None] * b[:,None]
80.4 ms ± 2.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [8]: %timeit ne.evaluate('A*B',{'A':a[...,None],'B':b[:,None]})
25.9 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

当将此解决方案扩展到更多输入时，这应该是实质性的.

This should be substantial when extending this solution to higher number of inputs.

这篇关于稀疏矩阵的numpy元素外部乘积的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

稀疏矩阵的numpy元素外部乘积 [英] numpy elementwise outer product with sparse matrices

问题描述

问题1 :性能

Problem 1: Performance

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

稀疏矩阵的numpy元素外部乘积 [英] numpy elementwise outer product with sparse matrices

问题描述

问题1 :性能

Problem 1: Performance

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭