Scipy hstack导致“ TypeError：类型不受支持的转换：（dtype（'float64'），dtype（'O'））” [英] Scipy hstack results in "TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))"

查看：923 发布时间：2020/10/16 23:27:47 python python-3.x numpy pandas dataframe

本文介绍了Scipy hstack导致“ TypeError：类型不受支持的转换：（dtype（'float64'），dtype（'O'））”的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试运行hstack将整数值的列连接到TF-IDF创建的列的列表中（这样，我最终可以在分类器中使用所有这些列/功能）。

I am trying to run hstack to join a column of integer values to a list of columns created by a TF-IDF (so I can eventually use all of these columns/features in a classifier).

我正在使用熊猫阅读该列，检查所有NA值并将其转换为数据框中的最大值，如下所示：

I'm reading in the column using pandas, checking for any NA values and converting them to the largest value in the dataframe like so :

  OtherColumn = p.read_csv('file.csv', delimiter=";", na_values=['?'])[["OtherColumn"]]
  OtherColumn = OtherColumn.fillna(OtherColumn.max())
  OtherColumn = OtherColumn.convert_objects(convert_numeric=True)

然后我在文本列中阅读并运行TF-IDF来创建许多功能：

Then I read in my text column and run TF-IDF to create loads of features :

  X = list(np.array(p.read_csv('file.csv', delimiter=";"))[:,2])

  tfv = TfidfVectorizer(min_df=3,  max_features=None, strip_accents='unicode',  
        analyzer='word',token_pattern=r'\w{1,}',ngram_range=(1, 2), use_idf=1,smooth_idf=1,sublinear_tf=1)
  tfv.fit(X)

最后，我想将它们全部加入，这是我们发生错误的地方，程序无法运行，而且我不确定是否在这里正确使用了StandardScaler：

Finally, I want to join them all together, and this is where our error occurs and the program cannot run, and also I am unsure whether I am using the StandardScaler appropriately here :

  X =  sp.sparse.hstack((X, OtherColumn.values)) #error here
  sc = preprocessing.StandardScaler().fit(X)
  X = sc.transform(X)
  X_test = sc.transform(X_test)

完整错误消息：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-79d1e70bc1bc> in <module>()
---> 47 X =  sp.sparse.hstack((X, OtherColumn.values))
     48 sc = preprocessing.StandardScaler().fit(X)
     49 X = sc.transform(X)

C:\Users\Simon\Anaconda\lib\site-packages\scipy\sparse\construct.pyc in hstack(blocks, format, dtype)
    421 
    422     """
--> 423     return bmat([blocks], format=format, dtype=dtype)
    424 
    425 

C:\Users\Simon\Anaconda\lib\site-packages\scipy\sparse\construct.pyc in bmat(blocks, format, dtype)
    537     nnz = sum([A.nnz for A in blocks[block_mask]])
    538     if dtype is None:
--> 539         dtype = upcast(*tuple([A.dtype for A in blocks[block_mask]]))
    540 
    541     row_offsets = np.concatenate(([0], np.cumsum(brow_lengths)))

C:\Users\Simon\Anaconda\lib\site-packages\scipy\sparse\sputils.pyc in upcast(*args)
     58             return t
     59 
---> 60     raise TypeError('no supported conversion for types: %r' % (args,))
     61 
     62 

TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))

推荐答案

如numpy hstack- ValueError：所有输入数组必须具有相同数量的维数 -但是确实如此，您很多人需要将输入显式转换为 sparse.hstack 。 稀疏代码不如核心 numpy 代码健壮。

As discussed in Numpy hstack - "ValueError: all the input arrays must have same number of dimensions" - but they do you many need to explicitly cast the inputs to sparse.hstack. The sparse code is not as robust as the core numpy code.

如果 X 是具有 dtype = float 和 A 带有 dtype = object 密集，可能有几种选择。

If X is a sparse array with dtype=float, and A is dense with dtype=object, several options are possible.

sparse.hstack(X, A) # error
sparse.hstack(X.astype(object), A) # cast X to object; return object
sparse.hstack(X, A.astype(float)) # cast A to float; return float
hstack(X.A, A) # make X dense, result will be type object

A.astype（float）将在 A 包含某些 NaN 。请参见 http://pandas.pydata.org/pandas-docs/stable/gotchas。 html 关于NaN。如果 A 是由于其他原因（例如参差不齐的列表）而成为对象，则我们将不得不重新考虑该问题。


A.astype(float) will work if A contains some NaN.  See http://pandas.pydata.org/pandas-docs/stable/gotchas.html regarding NaN. If A is object for some other reason (e.g. ragged lists), then we'll have to revisit the issue.
另一种可能性是使用Pandas的 concat 。  http://pandas.pydata.org/pandas-docs/stable/merging.html 。我认为熊猫比稀疏编码员更加关注这些问题。
Another possibility is to use Pandas's concat. http://pandas.pydata.org/pandas-docs/stable/merging.html.  I assume Pandas has paid more attention to these issues than the sparse coders.

                        这篇关于Scipy hstack导致“ TypeError：类型不受支持的转换：（dtype（'float64'），dtype（'O'））”的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Scipy hstack导致“ TypeError：类型不受支持的转换：（dtype（'float64'），dtype（'O'））” [英] Scipy hstack results in "TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))"

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Scipy hstack导致“ TypeError：类型不受支持的转换：（dtype（'float64'），dtype（'O'））” [英] Scipy hstack results in &quot;TypeError: no supported conversion for types: (dtype(&#39;float64&#39;), dtype(&#39;O&#39;))&quot;

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Scipy hstack导致“ TypeError：类型不受支持的转换：（dtype（'float64'），dtype（'O'））” [英] Scipy hstack results in "TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))"

登录关闭