使用 DataFrame.schema.fields.update 时出错 [英] Got a Error when using DataFrame.schema.fields.update

查看:22
本文介绍了使用 DataFrame.schema.fields.update 时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在我的 DataFrame 中投射两列.这是我的代码:

I want to cast two columns in my DataFrame. Here is my code:

val session = SparkSession
  .builder
  .master("local")
  .appName("UDTransform").getOrCreate()
var df: DataFrame = session.createDataFrame(Seq((1, "Spark", 111), (2, "Storm", 112), (3, "Hadoop", 113), (4, "Kafka", 114), (5, "Flume", 115), (6, "Hbase", 116)))
  .toDF("CID", "Name", "STD")
df.printSchema()
df.schema.fields.update(0, StructField("CID", StringType))
df.schema.fields.update(2, StructField("STD", StringType))
df.printSchema()
df.show()

我从控制台获取这些日志:

I get these logs from my console:

   root
 |-- CID: integer (nullable = false)
 |-- Name: string (nullable = true)
 |-- STD: integer (nullable = false)

root
 |-- CID: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- STD: string (nullable = true)

17/06/28 12:44:32 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 36, Column 31: A method named "toString" is not declared in any enclosing class nor any supertype, nor through a static import

我只想知道为什么会发生这个错误,我该如何解决?非常感谢!

All I want to know is why this ERROR happen and how can I solve it? appreciate that very much!

推荐答案

你不能更新数据框的架构,因为数据框是不可变的,但是您可以更新数据帧的架构并分配给新的数据帧.

You can not update the schema of dataframe since the dataframe are immutable, But you can update the schema of dataframe and assign to a new Dataframe.

你可以这样做

val newDF = df.withColumn("CID", col("CID").cast("string"))
.withColumn("STD", col("STD").cast("string"))

newDF.printSchema()

newDF 的模式是

    root
     |-- CID: string (nullable = true)
     |-- Name: string (nullable = true)
     |-- STD: string (nullable = true)

您的代码:

df.schema.fields.update(0, StructField("CID", StringType))
df.schema.fields.update(2, StructField("STD", StringType))
df.printSchema()
df.show()

在你的代码中

df.schema.fields 返回 StructFieldsArray 作为

df.schema.fields returns a Array of StructFields as

Array[StructFields]

那么如果您尝试更新为

df.schema.fields.update(0, StructField("CID", StringType))

这会更新 0th 位置的 Array[StructField] 的值,我这不是你想要的

This updates the value of Array[StructField] in 0th position, I this is not what you wanted

DataFrame.schema.fields.update 不更新数据帧架构,而是更新 DataFrame.schema.fields

DataFrame.schema.fields.update does not update the dataframe schema rather it updates the array of StructField returned by DataFrame.schema.fields

希望能帮到你

这篇关于使用 DataFrame.schema.fields.update 时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆