使用 DataFrame.schema.fields.update 时出错 [英] Got a Error when using DataFrame.schema.fields.update
问题描述
我想在我的 DataFrame 中投射两列.这是我的代码:
I want to cast two columns in my DataFrame. Here is my code:
val session = SparkSession
.builder
.master("local")
.appName("UDTransform").getOrCreate()
var df: DataFrame = session.createDataFrame(Seq((1, "Spark", 111), (2, "Storm", 112), (3, "Hadoop", 113), (4, "Kafka", 114), (5, "Flume", 115), (6, "Hbase", 116)))
.toDF("CID", "Name", "STD")
df.printSchema()
df.schema.fields.update(0, StructField("CID", StringType))
df.schema.fields.update(2, StructField("STD", StringType))
df.printSchema()
df.show()
我从控制台获取这些日志:
I get these logs from my console:
root
|-- CID: integer (nullable = false)
|-- Name: string (nullable = true)
|-- STD: integer (nullable = false)
root
|-- CID: string (nullable = true)
|-- Name: string (nullable = true)
|-- STD: string (nullable = true)
17/06/28 12:44:32 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 36, Column 31: A method named "toString" is not declared in any enclosing class nor any supertype, nor through a static import
我只想知道为什么会发生这个错误,我该如何解决?非常感谢!
All I want to know is why this ERROR happen and how can I solve it? appreciate that very much!
推荐答案
你不能更新数据框的架构,因为数据框是不可变的,但是您可以更新数据帧的架构并分配给新的数据帧.
You can not update the schema of dataframe since the dataframe are immutable, But you can update the schema of dataframe and assign to a new Dataframe.
你可以这样做
val newDF = df.withColumn("CID", col("CID").cast("string"))
.withColumn("STD", col("STD").cast("string"))
newDF.printSchema()
newDF 的模式是
root
|-- CID: string (nullable = true)
|-- Name: string (nullable = true)
|-- STD: string (nullable = true)
您的代码:
df.schema.fields.update(0, StructField("CID", StringType))
df.schema.fields.update(2, StructField("STD", StringType))
df.printSchema()
df.show()
在你的代码中
df.schema.fields
返回 StructFields
的 Array
作为
df.schema.fields
returns a Array
of StructFields
as
Array[StructFields]
那么如果您尝试更新为
df.schema.fields.update(0, StructField("CID", StringType))
这会更新 0th
位置的 Array[StructField]
的值,我这不是你想要的
This updates the value of Array[StructField]
in 0th
position, I this is not what you wanted
DataFrame.schema.fields.update
不更新数据帧架构,而是更新 DataFrame.schema.fields
DataFrame.schema.fields.update
does not update the dataframe schema rather it updates the array of StructField returned by DataFrame.schema.fields
希望能帮到你
这篇关于使用 DataFrame.schema.fields.update 时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!