更改列时出现稀疏效率警告 [英] Sparse Efficiency Warning while changing the column
问题描述
def tdm_modify(feature_names,tdm):
non_useful_words=['kill','stampede','trigger','cause','death','hospital'\
,'minister','said','told','say','injury','victim','report']
indexes=[feature_names.index(word) for word in non_useful_words]
for index in indexes:
tdm[:,index]=0
return tdm
我想为 tdm 矩阵中的某些项手动设置零权重.使用上面的代码我得到警告.我似乎不明白为什么?有没有更好的方法来做到这一点?
I want to manually set zero weights for some terms in tdm matrix. Using the above code I get the warning. I don't seem to understand why? Is there a better way to do this?
C:\Anaconda\lib\site-packages\scipy\sparse\compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
推荐答案
首先,这不是一个错误.这是一个警告.下次您(在会话中)执行此操作时,它会在没有警告的情况下执行.
First, it is not an error. It's a warning. The next time you perform this action (in a session) it will do it without warning.
对我来说,信息很明确:
To me the message is clear:
Changing the sparsity structure of a csr_matrix is expensive.
lil_matrix is more efficient.
tdm
是一个 csr_matrix
.数据与格式一起存储的方式,需要相当多的额外计算才能将一堆元素设置为 0(或 v.v 将它们从 0 更改).正如它所说,如果您需要经常进行此类更改,lil_matrix
格式会更好.
tdm
is a csr_matrix
. The way that data is stored with the format, it takes quite a bit of extra computation to set a bunch of the elements to 0 (or v.v to change them from 0). As it says, the lil_matrix
format is better if you need to do this sort of change frequently.
尝试对样本矩阵进行一些时间测试.tdm.tolil()
将矩阵转换为 lil
格式.
Try some time tests on a sample matrices. tdm.tolil()
will convert the matrix to lil
format.
我可以了解数据的存储方式以及为什么更改 csr
的效率低于 lil
.
I could get into how the data is stored and why changing csr
is less efficient than lil
.
我建议查看 sparse
格式及其各自的优缺点.
I'd suggest reviewing the sparse
formats, and their respective pros and cons.
一个简单的思考方式是 - csr
(和 csc
)专为快速数值计算而设计,尤其是矩阵乘法.他们为线性代数问题而开发.coo
是一种定义稀疏矩阵的便捷方式.lil
是一种增量构建矩阵的便捷方式.
A simple way to think about is - csr
(and csc
) are designed for fast numerical calculations, especially matrix multiplication. They developed for linear algebra problems. coo
is a convenient way of defining sparse matrices. lil
is a convenient way for building matrices incrementally.
您最初是如何构建 tdm
的?
How are you constructing tdm
initially?
在 scipy
测试文件(例如 scipy/sparse/linalg/dsolve/tests/test_linsolve.py
)中,我找到了代码
In scipy
test files (e.g. scipy/sparse/linalg/dsolve/tests/test_linsolve.py
) I find code that does
import warnings
from scipy.sparse import (spdiags, SparseEfficiencyWarning, csc_matrix,
csr_matrix, isspmatrix, dok_matrix, lil_matrix, bsr_matrix)
warnings.simplefilter('ignore',SparseEfficiencyWarning)
scipy/sparse/base.py
scipy/sparse/base.py
class SparseWarning(Warning):
pass
class SparseFormatWarning(SparseWarning):
pass
class SparseEfficiencyWarning(SparseWarning):
pass
这些警告使用标准 Python Warning
类,因此适用于控制其表达式的标准 Python 方法.
These warnings use the standard Python Warning
class, so standard Python methods for controlling their expression apply.
这篇关于更改列时出现稀疏效率警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!