SciPy PearsonR ValueError:具有多个元素的数组的真值不明确.使用a.any()或a.all() [英] SciPy PearsonR ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

查看:161
本文介绍了SciPy PearsonR ValueError:具有多个元素的数组的真值不明确.使用a.any()或a.all()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用SciPy的pearsonr方法时遇到一些问题.我试图使它尽可能简单(请注意华丽的N ^ 2循环),但仍然遇到了这个问题.我不完全了解我要去哪里.我的数组被正确选择,并且具有相同的维数.

I'm running into some issues while using the pearsonr method from SciPy. I tried to keep it as simple as possible (note gorgeous N^2 loop), but still I'm running up against this problem. I don't entirely understand where I'm going wrong. my arrays are getting selected correctly, and have the same dimensionality.

我运行的代码是:

from scipy import stats
from sklearn.preprocessing import LabelBinarizer, Binarizer
from sklearn.feature_extraction.text import CountVectorizer

ny_cluster = LabelBinarizer().fit_transform(ny_raw.clusterid.values)
ny_vocab = Binarizer().fit_transform(CountVectorizer().fit_transform(ny_raw.text.values))

ny_vc_phi = np.zeros((ny_vocab.shape[1], ny_cluster.shape[1]))
for i in xrange(ny_vc_phi.shape[0]):
    for j in xrange(ny_vc_phi.shape[1]):
        ny_vc_phi[i,j] = stats.pearsonr(ny_vocab[:,i].todense(), ny_cluster[:,j])[0]

哪个会产生错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/data/TweetClusters/TweetsLocationBayesClf/<ipython-input-29-ff1c3ac4156d> in <module>()
      3 for i in xrange(ny_vc_phi.shape[0]):
      4     for j in xrange(ny_vc_phi.shape[1]):
----> 5         ny_vc_phi[i,j] = stats.pearsonr(ny_vocab[:,i].todense(), ny_cluster[:,j])[0]
      6 

/usr/lib/python2.7/dist-packages/scipy/stats/stats.pyc in pearsonr(x, y)
   2201     # Presumably, if abs(r) > 1, then it is only some small artifact of floating

   2202     # point arithmetic.

-> 2203     r = max(min(r, 1.0), -1.0)
   2204     df = n-2
   2205     if abs(r) == 1.0:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我真的不知道选择的方向.当然,我不知道r变量是如何计算的,这无济于事.可能是我在某种程度上搞乱了我的输入吗?

I really don't understand where this selection is going on. Of course it doesn't help that I don't know how the r variable is getting calculated. Could it be that I am somehow messing up my inputs?

推荐答案

检查pearsonr的参数是否为一维数组.也就是说,ny_vocab[:,i].todense()ny_cluster[:,j]都应为1-d.试试:

Check that the arguments to pearsonr are one-dimensional arrays. That is, both ny_vocab[:,i].todense() and ny_cluster[:,j] should be 1-d. Try:

    ny_vc_phi[i,j] = stats.pearsonr(ny_vocab[:,i].todense().ravel(), ny_cluster[:,j].ravel())[0]

(我为pearsonr的每个参数添加了对ravel()的调用.)

(I added a call to ravel() to each of the arguments of pearsonr.)

这篇关于SciPy PearsonR ValueError:具有多个元素的数组的真值不明确.使用a.any()或a.all()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆