意外类型:< class'pyspark.sql.types.DataTypeSingleton'>在ApacheSpark数据框上强制转换为Int时 [英] unexpected type: <class 'pyspark.sql.types.DataTypeSingleton'> when casting to Int on a ApacheSpark Dataframe

查看:232
本文介绍了意外类型:< class'pyspark.sql.types.DataTypeSingleton'>在ApacheSpark数据框上强制转换为Int时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试在pyspark数据帧上将StringType强制转换为IntType时出现错误:

I'm having an error when trying to cast a StringType to a IntType on a pyspark dataframe:

joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)
joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\
    .select(aggregates.year,'Production')\
    .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\
    .drop("Production")\
    .withColumnRenamed("ProductionTmp", "Production")

我得到了:

TypeErrorTraceback(最近一次通话最近) 在 () 1个关节= aggregates.join(df_data_3,aggregates.year == df_data_3.year) ----> 2 joint2 = joint.filter(joint.CountyCode == 999).filter(joint.CropName =='WOOL')
.select(aggregates.year,'Production').withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType)).drop(生产")
.withColumnRenamed("ProductionTmp","Production")

TypeErrorTraceback (most recent call last) in () 1 joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year) ----> 2 joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')
.select(aggregates.year,'Production') .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType)) .drop("Production")
.withColumnRenamed("ProductionTmp", "Production")

/usr/local/src/spark20master/spark/python/pyspark/sql/column.py在 演员(自己,dataType) 第335章 336其他: -> 337引发TypeError(意外类型:%s"%type(dataType)) 338返回列(jc) 339

/usr/local/src/spark20master/spark/python/pyspark/sql/column.py in cast(self, dataType) 335 jc = self._jc.cast(jdt) 336 else: --> 337 raise TypeError("unexpected type: %s" % type(dataType)) 338 return Column(jc) 339

TypeError:意外类型:

TypeError: unexpected type:

推荐答案

PySpark SQL数据类型不再(在1.3之前是这种情况)单例.您必须创建一个实例:

PySpark SQL data types are no longer (it was the case before 1.3) singletons. You have to create an instance:

from pyspark.sql.types import IntegerType
from pyspark.sql.functions import col

col("foo").cast(IntegerType())

Column<b'CAST(foo AS INT)'>

与之相反:

col("foo").cast(IntegerType)

TypeError  
   ...
TypeError: unexpected type: <class 'type'>

cast方法也可以用于字符串描述:

cast method can be also used with string descriptions:

col("foo").cast("integer")

Column<b'CAST(foo AS INT)'>

这篇关于意外类型:&lt; class'pyspark.sql.types.DataTypeSingleton'&gt;在ApacheSpark数据框上强制转换为Int时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆