DASK:Typerrror:列分配不支持numpy.ndarray类型，而Pandas可以正常工作 [英] DASK: Typerrror: Column assignment doesn't support type numpy.ndarray whereas Pandas works fine

查看：213 发布时间：2020/5/18 23:35:57 python pandas numpy dask

本文介绍了DASK:Typerrror:列分配不支持numpy.ndarray类型，而Pandas可以正常工作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Dask读取10m行csv +，并执行一些计算.到目前为止，事实证明它比熊猫快10倍.

I'm using Dask to read in a 10m row csv+ and perform some calculations. So far it's proving to be 10x faster than Pandas.

下面有一段代码，当与pandas一起使用时效果很好，但是dask会引发类型错误. 我不确定如何克服打字错误.似乎在使用dask时，由select函数将数组移回数据框/列，但在使用pandas时，数组却没有?但是我不想把整个事情都改回大熊猫，而失去10倍的性能优势.

I have a piece of code, below, that when used with pandas works fine, but with dask throws a type error. I am unsure of how to overcome the typerror. It seems like an array is being handed back to the dataframe/column by the select function when using dask, but not when using pandas? But I don't want to switch the whole thing back to pandas and lose the 10x performance benefit.

这个答案是在Stack Overflow上获得其他一些帮助的结果，但是我认为这个问题与最初的问题相距甚远，这完全不同.下面的代码.

This answer is the result of some help of some others on Stack Overflow, however I think that question has deviated far enough from the initial question that this is altogether different. Code below.

PANDAS:有效 排除AndHeathSolRadFact所需的时间:40秒

PANDAS: Works Time Taken excluding AndHeathSolRadFact: 40 seconds

import pandas as pd
import numpy as np

from timeit import default_timer as timer
start = timer()
df = pd.read_csv(r'C:\Users\i5-Desktop\Downloads\Weathergrids.csv')
df['DateTime'] = pd.to_datetime(df['Date'], format='%Y-%d-%m %H:%M')
df['Month'] = df['DateTime'].dt.month
df['Grass_FMC'] = (97.7+4.06*df['RH'])/(df['Temperature']+6)-0.00854*df['RH']+3000/df['Curing']-30


df["AndHeathSolRadFact"] = np.select(
    [
    (df['Month'].between(8,12)),
    (df['Month'].between(1,2) & df['CloudCover']>30)
    ],  #list of conditions
    [1, 1],     #list of results
    default=0)    #default if no match



print(df.head())
#print(ddf.tail())
end = timer()
print(end - start)

任务:破损 排除AndHeathSolRadFact所需的时间:4秒

DASK: BROKEN Time Taken excluding AndHeathSolRadFact: 4 seconds

import dask.dataframe as dd
import dask.multiprocessing
import dask.threaded
import pandas as pd
import numpy as np

# Dataframes implement the Pandas API
import dask.dataframe as dd



from timeit import default_timer as timer
start = timer()
ddf = dd.read_csv(r'C:\Users\i5-Desktop\Downloads\Weathergrids.csv')
ddf['DateTime'] = dd.to_datetime(ddf['Date'], format='%Y-%d-%m %H:%M')
ddf['Month'] = ddf['DateTime'].dt.month
ddf['Grass_FMC'] = (97.7+4.06*ddf['RH'])/(ddf['Temperature']+6)-0.00854*ddf['RH']+3000/ddf['Curing']-30



ddf["AndHeathSolRadFact"] = np.select(
    [
    (ddf['Month'].between(8,12)),
    (ddf['Month'].between(1,2) & ddf['CloudCover']>30)
    ],  #list of conditions
    [1, 1],     #list of results
    default=0)    #default if no match



print(ddf.head())
#print(ddf.tail())
end = timer()
print(end - start)

错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-86c08f38bce6> in <module>
     29     ],  #list of conditions
     30     [1, 1],     #list of results
---> 31     default=0)    #default if no match
     32 
     33 

~\Anaconda3\lib\site-packages\dask\dataframe\core.py in __setitem__(self, key, value)
   3276             df = self.assign(**{k: value for k in key})
   3277         else:
-> 3278             df = self.assign(**{key: value})
   3279 
   3280         self.dask = df.dask

~\Anaconda3\lib\site-packages\dask\dataframe\core.py in assign(self, **kwargs)
   3510                 raise TypeError(
   3511                     "Column assignment doesn't support type "
-> 3512                     "{0}".format(typename(type(v)))
   3513                 )
   3514             if callable(v):

TypeError: Column assignment doesn't support type numpy.ndarray

Weathegrids CSV样本

Location,Date,Temperature,RH,WindDir,WindSpeed,DroughtFactor,Curing,CloudCover
1075,2019-20-09 04:00,6.8,99.3,143.9,5.6,10.0,93.0,1.0 
1075,2019-20-09 05:00,6.4,100.0,93.6,7.2,10.0,93.0,1.0
1075,2019-20-09 06:00,6.7,99.3,130.3,6.9,10.0,93.0,1.0
1075,2019-20-09 07:00,8.6,95.4,68.5,6.3,10.0,93.0,1.0
1075,2019-20-09 08:00,12.2,76.0,86.4,6.1,10.0,93.0,1.0

DASK:Typerrror:列分配不支持numpy.ndarray类型，而Pandas可以正常工作 [英] DASK: Typerrror: Column assignment doesn't support type numpy.ndarray whereas Pandas works fine

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

DASK:Typerrror:列分配不支持numpy.ndarray类型，而Pandas可以正常工作 [英] DASK: Typerrror: Column assignment doesn&#39;t support type numpy.ndarray whereas Pandas works fine

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

DASK:Typerrror:列分配不支持numpy.ndarray类型，而Pandas可以正常工作 [英] DASK: Typerrror: Column assignment doesn't support type numpy.ndarray whereas Pandas works fine

登录关闭