意外类型:< class'pyspark.sql.types.DataTypeSingleton'>在ApacheSpark数据框上强制转换为Int时 [英] unexpected type: <class 'pyspark.sql.types.DataTypeSingleton'> when casting to Int on a ApacheSpark Dataframe
问题描述
尝试在pyspark数据帧上将StringType强制转换为IntType时出现错误:
I'm having an error when trying to cast a StringType to a IntType on a pyspark dataframe:
joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)
joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\
.select(aggregates.year,'Production')\
.withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\
.drop("Production")\
.withColumnRenamed("ProductionTmp", "Production")
我得到了:
TypeErrorTraceback(最近一次通话最近) 在 () 1个关节= aggregates.join(df_data_3,aggregates.year == df_data_3.year) ----> 2 joint2 = joint.filter(joint.CountyCode == 999).filter(joint.CropName =='WOOL')
.select(aggregates.year,'Production').withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType)).drop(生产")
.withColumnRenamed("ProductionTmp","Production")
TypeErrorTraceback (most recent call last) in () 1 joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year) ----> 2 joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')
.select(aggregates.year,'Production') .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType)) .drop("Production")
.withColumnRenamed("ProductionTmp", "Production")
/usr/local/src/spark20master/spark/python/pyspark/sql/column.py在 演员(自己,dataType) 第335章 336其他: -> 337引发TypeError(意外类型:%s"%type(dataType)) 338返回列(jc) 339
/usr/local/src/spark20master/spark/python/pyspark/sql/column.py in cast(self, dataType) 335 jc = self._jc.cast(jdt) 336 else: --> 337 raise TypeError("unexpected type: %s" % type(dataType)) 338 return Column(jc) 339
TypeError:意外类型:
TypeError: unexpected type:
推荐答案
PySpark SQL数据类型不再(在1.3之前是这种情况)单例.您必须创建一个实例:
PySpark SQL data types are no longer (it was the case before 1.3) singletons. You have to create an instance:
from pyspark.sql.types import IntegerType
from pyspark.sql.functions import col
col("foo").cast(IntegerType())
Column<b'CAST(foo AS INT)'>
与之相反:
col("foo").cast(IntegerType)
TypeError
...
TypeError: unexpected type: <class 'type'>
cast
方法也可以用于字符串描述:
cast
method can be also used with string descriptions:
col("foo").cast("integer")
Column<b'CAST(foo AS INT)'>
这篇关于意外类型:< class'pyspark.sql.types.DataTypeSingleton'>在ApacheSpark数据框上强制转换为Int时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!