意外类型:<class 'pyspark.sql.types.DataTypeSingleton'>在 ApacheSpark 数据帧上转换为 Int 时 [英] unexpected type: <class 'pyspark.sql.types.DataTypeSingleton'> when casting to Int on a ApacheSpark Dataframe
问题描述
我在 pyspark 数据帧上尝试将 StringType 转换为 IntType 时出错:
joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)Joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\.select(aggregates.year,'生产')\.withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\.drop("生产")\.withColumnRenamed("ProductionTmp", "Production")
我得到:
<块引用>TypeErrorTraceback(最近一次调用最后一次)在 ()1 关节 =aggregates.join(df_data_3,aggregates.year==df_data_3.year)----> 2 Joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')
.select(aggregates.year,'Production') .withColumn("ProductionTmp",df_data_3.Production.cast(IntegerType)) .drop("Production")
.withColumnRenamed("ProductionTmp", "Production")
/usr/local/src/spark20master/spark/python/pyspark/sql/column.py 中演员(自我,数据类型)335 jc = self._jc.cast(jdt)336 其他:--> 337 raise TypeError("意外类型:%s" % type(dataType))第338回第339话
类型错误:意外类型:
PySpark SQL 数据类型不再是(1.3 之前的情况)单例.您必须创建一个实例:
from pyspark.sql.types import IntegerType从 pyspark.sql.functions 导入列col("foo").cast(IntegerType())
Column
对比:
col("foo").cast(IntegerType)
TypeError...类型错误:意外类型:<class 'type'>
cast
方法也可以用于字符串描述:
col("foo").cast("integer")
Column
有关 Spark SQL 和 Dataframes 中支持的数据类型的概述,可以单击此 链接.
I'm having an error when trying to cast a StringType to a IntType on a pyspark dataframe:
joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)
joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\
.select(aggregates.year,'Production')\
.withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\
.drop("Production")\
.withColumnRenamed("ProductionTmp", "Production")
I'm getting:
TypeErrorTraceback (most recent call last) in () 1 joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year) ----> 2 joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')
.select(aggregates.year,'Production') .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType)) .drop("Production")
.withColumnRenamed("ProductionTmp", "Production")/usr/local/src/spark20master/spark/python/pyspark/sql/column.py in cast(self, dataType) 335 jc = self._jc.cast(jdt) 336 else: --> 337 raise TypeError("unexpected type: %s" % type(dataType)) 338 return Column(jc) 339
TypeError: unexpected type:
PySpark SQL data types are no longer (it was the case before 1.3) singletons. You have to create an instance:
from pyspark.sql.types import IntegerType
from pyspark.sql.functions import col
col("foo").cast(IntegerType())
Column<b'CAST(foo AS INT)'>
In contrast to:
col("foo").cast(IntegerType)
TypeError
...
TypeError: unexpected type: <class 'type'>
cast
method can be also used with string descriptions:
col("foo").cast("integer")
Column<b'CAST(foo AS INT)'>
For an overview of the supported Data Types in Spark SQL and Dataframes, one can click this link.
这篇关于意外类型:<class 'pyspark.sql.types.DataTypeSingleton'>在 ApacheSpark 数据帧上转换为 Int 时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!