更改列时出现稀疏效率警告 [英] Sparse Efficiency Warning while changing the column

查看:123
本文介绍了更改列时出现稀疏效率警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

def tdm_modify(feature_names,tdm):
    non_useful_words=['kill','stampede','trigger','cause','death','hospital'\
        ,'minister','said','told','say','injury','victim','report']
    indexes=[feature_names.index(word) for word in non_useful_words]
    for index in indexes:
        tdm[:,index]=0   
    return tdm

我想为 tdm 矩阵中的某些项手动设置零权重.使用上面的代码我得到警告.我似乎不明白为什么?有没有更好的方法来做到这一点?

I want to manually set zero weights for some terms in tdm matrix. Using the above code I get the warning. I don't seem to understand why? Is there a better way to do this?

C:\Anaconda\lib\site-packages\scipy\sparse\compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)

推荐答案

首先,这不是一个错误.这是一个警告.下次您(在会话中)执行此操作时,它会在没有警告的情况下执行.

First, it is not an error. It's a warning. The next time you perform this action (in a session) it will do it without warning.

对我来说,信息很明确:

To me the message is clear:

Changing the sparsity structure of a csr_matrix is expensive. 
lil_matrix is more efficient.

tdm 是一个 csr_matrix.数据与格式一起存储的方式,需要相当多的额外计算才能将一堆元素设置为 0(或 v.v 将它们从 0 更改).正如它所说,如果您需要经常进行此类更改,lil_matrix 格式会更好.

tdm is a csr_matrix. The way that data is stored with the format, it takes quite a bit of extra computation to set a bunch of the elements to 0 (or v.v to change them from 0). As it says, the lil_matrix format is better if you need to do this sort of change frequently.

尝试对样本矩阵进行一些时间测试.tdm.tolil() 将矩阵转换为 lil 格式.

Try some time tests on a sample matrices. tdm.tolil() will convert the matrix to lil format.

我可以了解数据的存储方式以及为什么更改 csr 的效率低于 lil.

I could get into how the data is stored and why changing csr is less efficient than lil.

我建议查看 sparse 格式及其各自的优缺点.

I'd suggest reviewing the sparse formats, and their respective pros and cons.

一个简单的思考方式是 - csr(和 csc)专为快速数值计算而设计,尤其是矩阵乘法.他们为线性代数问题而开发.coo 是一种定义稀疏矩阵的便捷方式.lil 是一种增量构建矩阵的便捷方式.

A simple way to think about is - csr (and csc) are designed for fast numerical calculations, especially matrix multiplication. They developed for linear algebra problems. coo is a convenient way of defining sparse matrices. lil is a convenient way for building matrices incrementally.

您最初是如何构建 tdm 的?

How are you constructing tdm initially?

scipy 测试文件(例如 scipy/sparse/linalg/dsolve/tests/test_linsolve.py)中,我找到了代码

In scipy test files (e.g. scipy/sparse/linalg/dsolve/tests/test_linsolve.py) I find code that does

import warnings
from scipy.sparse import (spdiags, SparseEfficiencyWarning, csc_matrix,
    csr_matrix, isspmatrix, dok_matrix, lil_matrix, bsr_matrix)
warnings.simplefilter('ignore',SparseEfficiencyWarning)

scipy/sparse/base.py

scipy/sparse/base.py

class SparseWarning(Warning):
    pass
class SparseFormatWarning(SparseWarning):
    pass
class SparseEfficiencyWarning(SparseWarning):
    pass

这些警告使用标准 Python Warning 类,因此适用于控制其表达式的标准 Python 方法.

These warnings use the standard Python Warning class, so standard Python methods for controlling their expression apply.

这篇关于更改列时出现稀疏效率警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆