Spark将列转换为存储在字符串中的SQL类型 [英] Spark cast column to sql type stored in string

查看：354 发布时间：2020/9/4 20:47:08 scala apache-spark apache-spark-sql spark-dataframe

本文介绍了Spark将列转换为存储在字符串中的SQL类型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

简单的请求是我需要在数据框中添加一列的帮助，但是该列必须为空，其类型来自... spark.sql.types，并且必须从字符串中定义类型.

The simple request is I need help adding a column to a dataframe but, the column has to be empty, its type is from ...spark.sql.types and the type has to be defined from a string.

我可能可以使用ifs或case来做到这一点，但我正在寻找更优雅的东西.不需要为org.apache.spark.sql.types中的每种类型编写案例的东西

I can probably do this with ifs or case but I'm looking for something more elegant. Something that does not require writing a case for every type in org.apache.spark.sql.types

例如，如果我这样做:

df = df.withColumn("col_name", lit(null).cast(org.apache.spark.sql.types.StringType))

它可以按预期工作，但是我将类型存储为字符串

It works as intended, but I have the type stored as a string,

var the_type = "StringType"

或 var the_type ="org.apache.spark.sql.types.StringType"

or var the_type = "org.apache.spark.sql.types.StringType"

而且我无法通过从字符串中定义类型来使它正常工作.

and I can't get it to work by defining the type from the string.

对那些感兴趣的人有更多详细信息:我有一个包含元组(col_name，col_type)的集合，它们都作为字符串，并且我需要添加具有正确类型的列，以便将来在两个数据帧之间进行联合.

For those interested here are some more details: I have a set containing tuples (col_name, col_type) both as strings and I need to add columns with the correct types for a future union between 2 dataframes.

我目前有这个:

for (i <- set_of_col_type_tuples) yield {
    val tip = Class.forName("org.apache.spark.sql.types."+i._2)
    df = df.withColumn(i._1, lit(null).cast(the_type))
    df }

如果我使用

val the_type = Class.forName("org.apache.spark.sql.types."+i._2)

我知道

error: overloaded method value cast with alternatives:   (to: String)org.apache.spark.sql.Column <and>   (to: org.apache.spark.sql.types.DataType)org.apache.spark.sql.Column  cannot be applied to (Class[?0])

如果我使用

val the_type = Class.forName("org.apache.spark.sql.types."+i._2).getName()

这是一个字符串，所以我得到了

It's a string so I get:

org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '.' expecting {<EOF>, '('}(line 1, pos 3)
== SQL == org.apache.spark.sql.types.StringType
---^^^

因此，为了清楚起见，该集合包含这样的元组("col1"，"IntegerType")，("col2"，"StringType")而不是("col1"，"int")，("col2，" string).简单的强制转换(i._2)不起作用.

So, just to be clear, the set contains tuples like this ("col1","IntegerType"), ("col2","StringType") not ("col1","int"), ("col2","string"). A simple cast(i._2) does not work.

谢谢.

推荐答案

您可以使用重载方法cast，该方法将String作为参数:

You can use overloaded method cast, which has a String as an argument:

val stringType : String = ...
column.cast(stringType)

def cast(to:String):列

def cast(to: String): Column

使用规范字符串将列转换为其他数据类型类型的表示形式.

Casts the column to a different data type, using the canonical string representation of the type.

您还可以扫描所有数据类型:

You can also scan for all Data Types:

val types = classOf[DataTypes]
    .getDeclaredFields()
    .filter(f => java.lang.reflect.Modifier.isStatic(f.getModifiers()))
    .map(f => f.get(new DataTypes()).asInstanceOf[DataType])

现在的类型是Array [DataType].您可以将其翻译为地图:

Now types is Array[DataType]. You can translate it to Map:

val typeMap = types.map(t => (t.getClass.getSimpleName.replace("$", ""), t)).toMap

并在代码中使用:

column.cast(typeMap(yourType))

这篇关于Spark将列转换为存储在字符串中的SQL类型的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark将列转换为存储在字符串中的SQL类型 [英] Spark cast column to sql type stored in string

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark将列转换为存储在字符串中的SQL类型 [英] Spark cast column to sql type stored in string

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭