如何使用tf.matmul执行有效的稀疏矩阵乘法? [英] How to perform efficient sparse matrix multiplication by using tf.matmul?

查看:640
本文介绍了如何使用tf.matmul执行有效的稀疏矩阵乘法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用tf.matmul()执行稀疏矩阵乘法.

但是,推理速度比密集矩阵乘法要慢得多.

根据tf.sparse_matmul()中的描述:

  • 在一个平台上使用此矩阵与稠密矩阵相乘的盈亏平衡点是稀疏矩阵中30%的零值.

因此,我使稀疏矩阵具有7/8个零值.

这是我的代码:

import tensorflow as tf
import numpy as np
import time
a = tf.Variable(np.arange(1000).reshape(250,4) ,dtype=tf.float32) #dense matrix
b = tf.Variable(np.array([0,0,0,0,0,0,0,1],dtype=np.float32).reshape(4,2),dtype=tf.float32) # sparse matrix
c = tf.matmul(a,b,b_is_sparse=True) # do the sparse matrix multiplication

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    num_iteration = 5000
    num_burnin = 50
    duration = 0

    for i in range(num_iteration+num_burnin):
        startTime  = time.time()
        result = sess.run(c)
        endTime = time.time()
        if i > num_burnin :
            duration+= endTime-startTime

   print(" Average Inference Time = %.3f  ms"%(duration*1000/num_iteration))

我将"b_is_sparse = True"设置为稀疏矩阵乘法,并且在我的GeForce GTX 960M上大约需要0.380毫秒.

但是,如果我将"b_is_sparse = False"设置为进行密集矩阵乘法,则大约需要0.280毫秒.

我尝试使用tf.sparse_tensor_dense_matmul和tf.embedding_lookup_sparse进行稀疏矩阵乘法,但是推理速度仍然比密集矩阵乘法慢.

我的代码或执行稀疏矩阵乘法的其他方式有问题吗?

任何建议将不胜感激!

解决方案

相对性能取决于许多因素.希望稀疏乘法比使用密集矩阵的密集乘法更快,但您是对的,它也可能较慢.

一方面,它取决于矩阵的大小.

这里是两个平方矩阵相乘的结果,一个是随机的,另一个是用零填充的.并记录了密集和备用乘法的计算时间.

如您所见,即使矩阵完全为零,对于较小的矩阵大小,稀疏乘法也要比密集乘法慢-实际上,对于120x120左右的矩阵,稀疏乘法要慢将近三倍.在我的计算机上的该实验中,稀疏矩阵乘法开始接管约700x700的大小,并最终快了约2倍.当然,YMMV取决于您的配置.

I'm trying to perform a sparse matrix multiplication by using tf.matmul().

However, the inference speed is much more slower than dense matrix multiplication.

According to the description in tf.sparse_matmul() :

  • The breakeven for using this versus a dense matrix multiply on one platform was 30% zero values in the sparse matrix.

Thus , I make the sparse matrix with 7/8 zero values.

Here is my code:

import tensorflow as tf
import numpy as np
import time
a = tf.Variable(np.arange(1000).reshape(250,4) ,dtype=tf.float32) #dense matrix
b = tf.Variable(np.array([0,0,0,0,0,0,0,1],dtype=np.float32).reshape(4,2),dtype=tf.float32) # sparse matrix
c = tf.matmul(a,b,b_is_sparse=True) # do the sparse matrix multiplication

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    num_iteration = 5000
    num_burnin = 50
    duration = 0

    for i in range(num_iteration+num_burnin):
        startTime  = time.time()
        result = sess.run(c)
        endTime = time.time()
        if i > num_burnin :
            duration+= endTime-startTime

   print(" Average Inference Time = %.3f  ms"%(duration*1000/num_iteration))

I set "b_is_sparse=True" to do a sparse matrix multiplication , and it takes about 0.380 ms on my GeForce GTX 960M.

However , if I set "b_is_sparse=False" to do a dense matrix multiplication , it takes about 0.280 ms.

I have tried to use tf.sparse_tensor_dense_matmul and tf.embedding_lookup_sparse to perform sparse matrix multiplication , but the inference speed is still slower than dense matrix multiplication.

Is there something wrong in my code or other way to perform sparse matrix multiplication ?

Any advice will be greatly appreciated!!

解决方案

The relative performance depends on many factor. Sparse multiplication can be faster than dense multiplication with dense matrix (hopefully), but you are right that it can also be slower.

For one thing, it depends on the size of your matrix.

Here is the result of the multiplication of two square matrices, one random and one filled with zeros, and recorded the computation time for dense and spare multiplication.

As you can see, even with a completely zero matrix, sparse multiplication can be slower than dense multiplication for smaller matrix size -- in fact almost three times slower for matrices about 120x120. In this experiment on my computer, sparse matrix multiplication starts taking over for sizes of about 700x700 and ends up being about 2 times faster. Of course YMMV depending on your configuration.

这篇关于如何使用tf.matmul执行有效的稀疏矩阵乘法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆