使用pandas.SparseSeries.from_coo()函数的非NDFFrame对象错误 [英] non-NDFFrame object error using pandas.SparseSeries.from_coo() function

查看:136
本文介绍了使用pandas.SparseSeries.from_coo()函数的非NDFFrame对象错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将COO类型的稀疏矩阵(从Scipy.Sparse)转换为Pandas稀疏序列.摘自文档( http://pandas.pydata.org/pandas-docs/stable /sparse.html )表示使用命令SparseSeries.from_coo(A).似乎可以,但是当我尝试查看该系列的属性时,就会发生这种情况.

I am trying to convert a COO type sparse matrix (from Scipy.Sparse) to a Pandas sparse series. From the documentation(http://pandas.pydata.org/pandas-docs/stable/sparse.html) it says to use the command SparseSeries.from_coo(A). This seems to be OK, but when I try to see the series' attributes, this is what happens.

10x10似乎还可以.

10x10 seems OK.

import pandas as pd 
import scipy.sparse as ss 
import numpy as np 
row = (np.random.random(10)*10).astype(int) 
col = (np.random.random(10)*10).astype(int) 
val = np.random.random(10)*10 
sparse = ss.coo_matrix((val,(row,col)),shape=(10,10)) 
pss = pd.SparseSeries.from_coo(sparse)
print pss
0  7    1.416631
   9    5.833902
1  0    4.131919
2  3    2.820531
   7    2.227009
3  1    9.205619
4  4    8.309077
6  0    4.376921
7  6    8.444013
   7    7.383886
dtype: float64
BlockIndex
Block locations: array([0])
Block lengths: array([10])

但不是100x100.

But not 100x100.

import pandas as pd 
import scipy.sparse as ss 
import numpy as np 
row = (np.random.random(100)*100).astype(int) 
col = (np.random.random(100)*100).astype(int) 
val = np.random.random(100)*100 
sparse = ss.coo_matrix((val,(row,col)),shape=(100,100)) 
pss = pd.SparseSeries.from_coo(sparse)
print pss

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-790-f0c22a601b93> in <module>()
      7 sparse = ss.coo_matrix((val,(row,col)),shape=(100,100))
      8 pss = pd.SparseSeries.from_coo(sparse)
----> 9 print pss
     10 

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\base.pyc in __str__(self)
     45         if compat.PY3:
     46             return self.__unicode__()
---> 47         return self.__bytes__()
     48 
     49     def __bytes__(self):

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\base.pyc in __bytes__(self)
     57 
     58         encoding = get_option("display.encoding")
---> 59         return self.__unicode__().encode(encoding, 'replace')
     60 
     61     def __repr__(self):

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\sparse\series.pyc in __unicode__(self)
    287     def __unicode__(self):
    288         # currently, unicode is same as repr...fixes infinite loop
--> 289         series_rep = Series.__unicode__(self)
    290         rep = '%s\n%s' % (series_rep, repr(self.sp_index))
    291         return rep

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in __unicode__(self)
    895 
    896         self.to_string(buf=buf, name=self.name, dtype=self.dtype,
--> 897                        max_rows=max_rows)
    898         result = buf.getvalue()
    899 

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in to_string(self, buf, na_rep, float_format, header, length, dtype, name, max_rows)
    960         the_repr = self._get_repr(float_format=float_format, na_rep=na_rep,
    961                                   header=header, length=length, dtype=dtype,
--> 962                                   name=name, max_rows=max_rows)
    963 
    964         # catch contract violations

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in _get_repr(self, name, header, length, dtype, na_rep, float_format, max_rows)
    989                                         na_rep=na_rep,
    990                                         float_format=float_format,
--> 991                                         max_rows=max_rows)
    992         result = formatter.to_string()
    993 

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\format.pyc in __init__(self, series, buf, length, header, na_rep, name, float_format, dtype, max_rows)
    145         self.dtype = dtype
    146 
--> 147         self._chk_truncate()
    148 
    149     def _chk_truncate(self):

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\format.pyc in _chk_truncate(self)
    158             else:
    159                 row_num = max_rows // 2
--> 160                 series = concat((series.iloc[:row_num], series.iloc[-row_num:]))
    161             self.tr_row_num = row_num
    162         self.tr_series = series

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\tools\merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    752                        keys=keys, levels=levels, names=names,
    753                        verify_integrity=verify_integrity,
--> 754                        copy=copy)
    755     return op.get_result()
    756 

C:\Users\ej\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\tools\merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    803         for obj in objs:
    804             if not isinstance(obj, NDFrame):
--> 805                 raise TypeError("cannot concatenate a non-NDFrame object")
    806 
    807             # consolidate

TypeError: cannot concatenate a non-NDFrame object

我不太理解错误消息-我想我只是按照我自己的COO矩阵(可能是大小?)来照会信函中的示例.

I don't really understand the error message - I think I am following the example in the documentation to the letter, just using my own COO matrix (could it be the size?)

致谢

推荐答案

我有一个更老的pandas.它具有稀疏代码,但没有tocoo. 与此相关的熊猫问题是: https://github.com/pydata/pandas/issues/10818

I have an older pandas. It has the sparse code, but not the tocoo. The pandas issue that has been filed in connection with this is: https://github.com/pydata/pandas/issues/10818

但是我在github上发现:

def _coo_to_sparse_series(A, dense_index=False):
    """ Convert a scipy.sparse.coo_matrix to a SparseSeries.
    Use the defaults given in the SparseSeries constructor. """
    s = Series(A.data, MultiIndex.from_arrays((A.row, A.col)))
    s = s.sort_index()
    s = s.to_sparse()  # TODO: specify kind?
    # ...
    return s

使用一个稀疏的稀疏矩阵,我可以毫无问题地构建并显示它:

With a smallish sparse matrix I construct and display without problems:

In [259]: Asml=sparse.coo_matrix(np.arange(10*5).reshape(10,5))
In [260]: s=pd.Series(Asml.data,pd.MultiIndex.from_arrays((Asml.row,Asml.col)))
In [261]: s=s.sort_index()
In [262]: s
Out[262]: 
0  1     1
   2     2
   3     3
   4     4
1  0     5
   1     6
   2     7
 [...  mine]
   3    48
   4    49
dtype: int32
In [263]: ssml=s.to_sparse()
In [264]: ssml
Out[264]: 
0  1     1
   2     2
   3     3
   4     4
1  0     5
  [...  mine]
   2    47
   3    48
   4    49
dtype: int32
BlockIndex
Block locations: array([0])
Block lengths: array([49])

但是使用更大的数组(更多非零元素)时出现显示错误.我猜想这发生在(普通)系列的显示开始使用省略号(...)时.我在Py3中运行,所以得到了不同的错误消息.

but with a larger array (more nonzero elements) I get a display error. I'm guessing it happens when the display for the (plain) series starts to use an ellipsis (...). I'm running in Py3, so I get a different error message.

....\pandas\core\base.pyc in __str__(self)
     45         if compat.PY3:
     46             return self.__unicode__()   # py3
     47         return self.__bytes__()         # py2 route

例如:

In [265]: Asml=sparse.coo_matrix(np.arange(10*7).reshape(10,7))
In [266]: s=pd.Series(Asml.data,pd.MultiIndex.from_arrays((Asml.row,Asml.col)))
In [267]: s=s.sort_index()
In [268]: s
Out[268]: 
0  1     1
   2     2
   3     3
   4     4
   5     5
   6     6
1  0     7
   1     8
   2     9
   3    10
   4    11
   5    12
   6    13
2  0    14
   1    15
...
7  6    55
8  0    56
   1    57
[... mine]
Length: 69, dtype: int32
In [269]: ssml=s.to_sparse()
In [270]: ssml
Out[270]: <repr(<pandas.sparse.series.SparseSeries at 0xaff6bc0c>)
failed: AttributeError: 'SparseArray' object has no attribute '_get_repr'>

我对熊猫的代码和结构还不够熟悉,无法暂时推断出更多信息.

I'm not sufficiently familiar with pandas code and structures to deduce much more for now.

这篇关于使用pandas.SparseSeries.from_coo()函数的非NDFFrame对象错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆