蟒错误:未施胶len对象(),而使用statsmodels与一行数据 [英] Python error: len() of unsized object while using statsmodels with one row of data

查看:1279
本文介绍了蟒错误:未施胶len对象(),而使用statsmodels与一行数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够使用statsmodel的WLS(<一个href=\"http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.WLS.html\"相对=nofollow>加权最小二乘回归)很好,当我有很多的数据点。但是,我似乎有与numpy的数组一个问题,当我尝试使用WLS从数据集的一个样本。

I'm able to use the statsmodel's WLS (weighted least squares regression) fine when I have lots of datapoints. However, I seem to be having a problem with the numpy arrays when I try to use WLS for a single sample from the dataset.

我的意思是,如果我有一个数据点¯x这是一个二维数组,有很多行,WLS工作正常。但如果我尝试去解决它在一个单行。你会明白我的意思在下面的code:

What I mean is, if I have a dataset X which is a 2D array, with lots of rows, WLS works fine. But not if I try to work it on a single row. You'll get what I mean in the code below:

import sys
from sklearn.externals.six.moves import xrange
from sklearn.metrics import accuracy_score
import pylab as pl
from sklearn.externals.six.moves import zip
import numpy as np
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std

# this is my dataset X, with 10 rows
X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]])
# this is my response vector, y, also with 10 rows
y = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 1])
# weights, 10 rows
weights = np.array([ 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1 ])

# the line below, using all 10 rows of X, gives no errors but is commented out
# mod_wls = sm.WLS(y, X, weights)
# and this is the line I need, which is giving errors:
mod_wls = sm.WLS(np.array(y[0]), np.array([X[0]]),np.array([weights[0]]))

上面的最后一行是最初只是 mod_wls = sm.WLS(Y [0],X [0],重[0])

但是,这给了我像键入'numpy.float64'的对象错误没有LEN(),所以我把他们变成阵列。
但现在我不断收到此错误:

But that gave me errors like object of type 'numpy.float64' has no len(), hence I turned them into arrays. But now I keep getting this error:

Traceback (most recent call last):
  File "C:\Users\app\Documents\Python Scripts\test.py", line 53, in <module>
    mod_wls = sm.WLS(np.array(y[0]), np.array([X[0]]),np.array([weights[0]]))
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 383, in __init__
    weights=weights, hasconst=hasconst)
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 79, in __init__
    super(RegressionModel, self).__init__(endog, exog, **kwargs)
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\model.py", line 136, in __init__
    super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\model.py", line 52, in __init__
    self.data = handle_data(endog, exog, missing, hasconst, **kwargs)
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 401, in handle_data
    return klass(endog, exog=exog, missing=missing, hasconst=hasconst, **kwargs)
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 78, in __init__
    self._check_integrity()
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 249, in _check_integrity
    print len(self.endog)
TypeError: len() of unsized object

所以为了看看有什么是错的长度,我这样做:

So in order to see what was wrong with the lengths, I did this:

print "y size: "
print len(np.array([y[0]]))
print "X size"
print len (np.array([X[0]]))
print "weights size"
print len(np.array([weights[0]]))

和得到这个输出:

y size: 
1
X size
1
weights size
1

我又试图这样的:

I then tried this:

print "x shape"
print X[0].shape
print "y shape"
print y[0].shape

和输出是:

x shape
(3L,)
y shape
()

249线在data.py,其中提到的错误,有这个功能,在这里我为了增加一堆打印尺寸,看看发生了什么事:

Line 249 in data.py, which the error referred to, has this function, where I added a bunch of "print sizes" in order to see what was happening:

def _check_integrity(self):
    if self.exog is not None:
        print "exog size: " 
        print len(self.exog)            
        print "endog size"
        print len(self.endog) # <-- this, and the line below are causing the error
        if len(self.exog) != len(self.endog):
            raise ValueError("endog and exog matrices are different sizes")

这似乎有什么毛病 LEN(self.endog)。虽然当我试图打印出 LEN(np.array([Y [0])),它只是给了输出 1 。但不知何故,当进入check_integrity功能,成为 endog ,它不具有相同的行为....或者是别的东西怎么回事?

It appears there's something wrong with len(self.endog). Although when I tried printing out len(np.array([y[0]])), it simply gave the output 1. But somehow when y goes into the check_integrity function and becomes endog, it doesn't behave the same.... or is something else going on?

我应该怎么办?我使用的算法,我真的需要运行WLS为每一行 X 分开。

What should I do? I'm using an algorithm where I really do need to run WLS for each row of X separately.

推荐答案

有作为WLS没有这样的事情对于一个观察。该单件重量会简直成了1时,他们归总结为1。如果你想这样做,虽然我supsect你不这样做,只是用OLS。该解决方案将是SVD的结果没有在数据中的任何实际关系,但。

There's no such thing as WLS for one observation. The single weight would simply become 1 when they're normalized to sum to 1. If you want to do this, though I supsect you don't, just use OLS. The solution will be a consequence of the SVD not any actual relationship in the data though.

OLS解决方案使用PINV / SVD

OLS solution using pinv/svd

np.dot(np.linalg.pinv(X[[0]]), y[0])

虽然你可以只是弥补这适用于任何答案,并得到同样的结果。我不知道随便什么SVD解决方案的完全性能主场迎战其他非独特的解决方案。

Though you could just make up any answer that works and get the same result. I'm not sure offhand what exactly the properties of the SVD solution are vs. the other non-unique solutions.

[~/]
[26]: beta = [-.5, .25, 1/3.]

[~/]
[27]: np.dot(beta, X[0])
[27]: 1.0

这篇关于蟒错误:未施胶len对象(),而使用statsmodels与一行数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆