在Spark DataFrame中动态转换列 [英] casting a column dynamically in spark dataframe

查看:293
本文介绍了在Spark DataFrame中动态转换列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够从现有列(字符串类型)中创建一个新列并将其动态转换为类型.

I want to to be able to create a new column out of an existing column(of type string) and cast it to a type dynamically.

resultDF = resultDF.withColumn(newColumnName, df(oldColumnName).cast(Helper.getCast(currentDataType)))

理想情况下,Helper.getCast udf应该返回所有数据类型的超类,例如IntegralType,StringType,DoubleType,但我看不到超类.帮助吗?

Ideally, Helper.getCast udf should return a superclass of all the datatypes like IntegralType, StringType, DoubleType but I don't see a super class. help?

我尝试了以下操作,但是它抱怨IntegralType与预期的DataType类型不符

I tried the below but it complains IntegralType doesn't conform to expected type DataType

object Helper {
def cast(datatype: String) : DataType = {
datatype match {
  case "int" => IntegralType
  case "string" => StringType
}
}

推荐答案

IntegralType不在受支持的 DataTypes

IntegralType is not in the supported DataTypes,

受支持的 DataTypes

StringType  //Gets the StringType object.
BinaryType  //Gets the BinaryType object.
BooleanType //Gets the BooleanType object.
DateType  //Gets the DateType object.
TimestampType //Gets the TimestampType object.
CalendarIntervalType  //Gets the CalendarIntervalType object.
DoubleType  //Gets the DoubleType object.
FloatType //Gets the FloatType object.
ByteType  //Gets the ByteType object.
IntegerType //Gets the IntegerType object.
LongType  //Gets the LongType object.
ShortType //Gets the ShortType object.
NullType  //Gets the NullType object.

除了这些,您还可以创建ArrayTypeMapTypeDecimalTypeStructType

In addition to these you can create ArrayType, MapType, DecimalType and StructType too

public static ArrayType createArrayType(DataType elementType)     //Creates an ArrayType by specifying the data type of elements ({@code elementType}).
public static ArrayType createArrayType(DataType elementType, boolean containsNull)     //Creates an ArrayType by specifying the data type of elements ({@code elementType}) and whether the array contains null values ({@code containsNull}).
public static DecimalType createDecimalType(int precision, int scale)     //Creates a DecimalType by specifying the precision and scale.
public static DecimalType createDecimalType()     //Creates a DecimalType with default precision and scale, which are 10 and 0.
public static MapType createMapType(DataType keyType, DataType valueType)     //Creates a MapType by specifying the data type of keys ({@code keyType}) and values
public static MapType createMapType(DataType keyType, DataType valueType, boolean valueContainsNull)     //Creates a MapType by specifying the data type of keys ({@code keyType}), the data type of values ({@code keyType}), and whether values contain any null value ({@code valueContainsNull}).
public static StructType createStructType(List<StructField> fields)     //Creates a StructType with the given list of StructFields ({@code fields}).
public static StructType createStructType(StructField[] fields)     //Creates a StructType with the given StructField array ({@code fields}).

因此正确的Helper对象应该是

object Helper {
def cast(datatype: String) : DataType = {
datatype match {
  case "int" => IntegerType
  case "string" => StringType
}
}

这篇关于在Spark DataFrame中动态转换列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆