使用DataFrame.schema.fields.update时出错 [英] Got a Error when using DataFrame.schema.fields.update

查看:292
本文介绍了使用DataFrame.schema.fields.update时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在DataFrame中投射两列.这是我的代码:

I want to cast two columns in my DataFrame. Here is my code:

val session = SparkSession
  .builder
  .master("local")
  .appName("UDTransform").getOrCreate()
var df: DataFrame = session.createDataFrame(Seq((1, "Spark", 111), (2, "Storm", 112), (3, "Hadoop", 113), (4, "Kafka", 114), (5, "Flume", 115), (6, "Hbase", 116)))
  .toDF("CID", "Name", "STD")
df.printSchema()
df.schema.fields.update(0, StructField("CID", StringType))
df.schema.fields.update(2, StructField("STD", StringType))
df.printSchema()
df.show()

我从控制台获取以下日志:

I get these logs from my console:

   root
 |-- CID: integer (nullable = false)
 |-- Name: string (nullable = true)
 |-- STD: integer (nullable = false)

root
 |-- CID: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- STD: string (nullable = true)

17/06/28 12:44:32 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 36, Column 31: A method named "toString" is not declared in any enclosing class nor any supertype, nor through a static import

我想知道的是为什么会发生此错误,我该如何解决? 非常感谢!

All I want to know is why this ERROR happen and how can I solve it? appreciate that very much!

推荐答案

由于数据框是不可变的,因此无法更新数据框的架构, 但是您可以更新数据框的架构并分配给新的数据框.

You can not update the schema of dataframe since the dataframe are immutable, But you can update the schema of dataframe and assign to a new Dataframe.

这是您可以怎么做

val newDF = df.withColumn("CID", col("CID").cast("string"))
.withColumn("STD", col("STD").cast("string"))

newDF.printSchema()

newDF的架构为

    root
     |-- CID: string (nullable = true)
     |-- Name: string (nullable = true)
     |-- STD: string (nullable = true)

您的代码:

df.schema.fields.update(0, StructField("CID", StringType))
df.schema.fields.update(2, StructField("STD", StringType))
df.printSchema()
df.show()

在您的代码中

df.schema.fields返回StructFieldsArray作为

Array[StructFields]

然后,如果您尝试更新为

then if you try to update as

df.schema.fields.update(0, StructField("CID", StringType))

这会更新0th位置中Array[StructField]的值,这不是您想要的

This updates the value of Array[StructField] in 0th position, I this is not what you wanted

DataFrame.schema.fields.update不会更新数据框架构,而是会更新DataFrame.schema.fields

DataFrame.schema.fields.update does not update the dataframe schema rather it updates the array of StructField returned by DataFrame.schema.fields

希望这对您有帮助

这篇关于使用DataFrame.schema.fields.update时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆