意外类型:<class 'pyspark.sql.types.DataTypeSingleton'>在 ApacheSpark 数据帧上转换为 Int 时 [英] unexpected type: <class 'pyspark.sql.types.DataTypeSingleton'> when casting to Int on a ApacheSpark Dataframe

查看:47
本文介绍了意外类型:<class 'pyspark.sql.types.DataTypeSingleton'>在 ApacheSpark 数据帧上转换为 Int 时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 pyspark 数据帧上尝试将 StringType 转换为 IntType 时出错:

joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)Joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\.select(aggregates.year,'生产')\.withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\.drop("生产")\.withColumnRenamed("ProductionTmp", "Production")

我得到:

<块引用>

TypeErrorTraceback(最近一次调用最后一次)在 ()1 关节 =aggregates.join(df_data_3,aggregates.year==df_data_3.year)----> 2 Joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')
.select(aggregates.year,'Production') .withColumn("ProductionTmp",df_data_3.Production.cast(IntegerType)) .drop("Production")
.withColumnRenamed("ProductionTmp", "Production")

/usr/local/src/spark20master/spark/python/pyspark/sql/column.py 中演员(自我,数据类型)335 jc = self._jc.cast(jdt)336 其他:--> 337 raise TypeError("意外类型:%s" % type(dataType))第338回第339话

类型错误:意外类型:

解决方案

PySpark SQL 数据类型不再是(1.3 之前的情况)单例.您必须创建一个实例:

 from pyspark.sql.types import IntegerType从 pyspark.sql.functions 导入列col("foo").cast(IntegerType())

Column

对比:

col("foo").cast(IntegerType)

TypeError...类型错误:意外类型:<class 'type'>

cast 方法也可以用于字符串描述:

col("foo").cast("integer")

Column

有关 Spark SQL 和 Dataframes 中支持的数据类型的概述,可以单击此 链接.

I'm having an error when trying to cast a StringType to a IntType on a pyspark dataframe:

joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year)
joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')\
    .select(aggregates.year,'Production')\
    .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType))\
    .drop("Production")\
    .withColumnRenamed("ProductionTmp", "Production")

I'm getting:

TypeErrorTraceback (most recent call last) in () 1 joint = aggregates.join(df_data_3,aggregates.year==df_data_3.year) ----> 2 joint2 = joint.filter(joint.CountyCode==999).filter(joint.CropName=='WOOL')
.select(aggregates.year,'Production') .withColumn("ProductionTmp", df_data_3.Production.cast(IntegerType)) .drop("Production")
.withColumnRenamed("ProductionTmp", "Production")

/usr/local/src/spark20master/spark/python/pyspark/sql/column.py in cast(self, dataType) 335 jc = self._jc.cast(jdt) 336 else: --> 337 raise TypeError("unexpected type: %s" % type(dataType)) 338 return Column(jc) 339

TypeError: unexpected type:

解决方案

PySpark SQL data types are no longer (it was the case before 1.3) singletons. You have to create an instance:

from pyspark.sql.types import IntegerType
from pyspark.sql.functions import col

col("foo").cast(IntegerType())

Column<b'CAST(foo AS INT)'>

In contrast to:

col("foo").cast(IntegerType)

TypeError  
   ...
TypeError: unexpected type: <class 'type'>

cast method can be also used with string descriptions:

col("foo").cast("integer")

Column<b'CAST(foo AS INT)'>

For an overview of the supported Data Types in Spark SQL and Dataframes, one can click this link.

这篇关于意外类型:&lt;class 'pyspark.sql.types.DataTypeSingleton'&gt;在 ApacheSpark 数据帧上转换为 Int 时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆