使用DataFrame.schema.fields.update时出错 [英] Got a Error when using DataFrame.schema.fields.update
问题描述
我想在DataFrame中投射两列.这是我的代码:
I want to cast two columns in my DataFrame. Here is my code:
val session = SparkSession
.builder
.master("local")
.appName("UDTransform").getOrCreate()
var df: DataFrame = session.createDataFrame(Seq((1, "Spark", 111), (2, "Storm", 112), (3, "Hadoop", 113), (4, "Kafka", 114), (5, "Flume", 115), (6, "Hbase", 116)))
.toDF("CID", "Name", "STD")
df.printSchema()
df.schema.fields.update(0, StructField("CID", StringType))
df.schema.fields.update(2, StructField("STD", StringType))
df.printSchema()
df.show()
我从控制台获取以下日志:
I get these logs from my console:
root
|-- CID: integer (nullable = false)
|-- Name: string (nullable = true)
|-- STD: integer (nullable = false)
root
|-- CID: string (nullable = true)
|-- Name: string (nullable = true)
|-- STD: string (nullable = true)
17/06/28 12:44:32 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 36, Column 31: A method named "toString" is not declared in any enclosing class nor any supertype, nor through a static import
我想知道的是为什么会发生此错误,我该如何解决? 非常感谢!
All I want to know is why this ERROR happen and how can I solve it? appreciate that very much!
推荐答案
由于数据框是不可变的,因此无法更新数据框的架构, 但是您可以更新数据框的架构并分配给新的数据框.
You can not update the schema of dataframe since the dataframe are immutable, But you can update the schema of dataframe and assign to a new Dataframe.
这是您可以怎么做
val newDF = df.withColumn("CID", col("CID").cast("string"))
.withColumn("STD", col("STD").cast("string"))
newDF.printSchema()
newDF的架构为
root
|-- CID: string (nullable = true)
|-- Name: string (nullable = true)
|-- STD: string (nullable = true)
您的代码:
df.schema.fields.update(0, StructField("CID", StringType))
df.schema.fields.update(2, StructField("STD", StringType))
df.printSchema()
df.show()
在您的代码中
df.schema.fields
返回StructFields
的Array
作为
Array[StructFields]
然后,如果您尝试更新为
then if you try to update as
df.schema.fields.update(0, StructField("CID", StringType))
这会更新0th
位置中Array[StructField]
的值,这不是您想要的
This updates the value of Array[StructField]
in 0th
position, I this is not what you wanted
DataFrame.schema.fields.update
不会更新数据框架构,而是会更新DataFrame.schema.fields
DataFrame.schema.fields.update
does not update the dataframe schema rather it updates the array of StructField returned by DataFrame.schema.fields
希望这对您有帮助
这篇关于使用DataFrame.schema.fields.update时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!