Scala-Spark:如何在循环中合并所有数据帧 [英] scala - Spark : How to union all dataframe in loop
本文介绍了Scala-Spark:如何在循环中合并所有数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
有没有一种方法可以获取将数据框循环合并的数据框?
Is there a way to get the dataframe that union dataframe in loop?
这是示例代码:
var fruits = List(
"apple"
,"orange"
,"melon"
)
for (x <- fruits){
var df = Seq(("aaa","bbb",x)).toDF("aCol","bCol","name")
}
我想获得这样的东西:
aCol | bCol | fruitsName
aaa,bbb,apple
aaa,bbb,orange
aaa,bbb,melon
再次感谢
推荐答案
Steffen Schmitz的答案是我认为最简洁的答案. 如果您正在寻找更多的自定义项(字段类型等),则下面是更详细的答案:
Steffen Schmitz's answer is the most concise one I believe. Below is a more detailed answer if you are looking for more customization (of field types, etc):
import org.apache.spark.sql.types.{StructType, StructField, StringType}
import org.apache.spark.sql.Row
//initialize DF
val schema = StructType(
StructField("aCol", StringType, true) ::
StructField("bCol", StringType, true) ::
StructField("name", StringType, true) :: Nil)
var initialDF = spark.createDataFrame(sc.emptyRDD[Row], schema)
//list to iterate through
var fruits = List(
"apple"
,"orange"
,"melon"
)
for (x <- fruits) {
//union returns a new dataset
initialDF = initialDF.union(Seq(("aaa", "bbb", x)).toDF)
}
//initialDF.show()
参考:
- 如何使用指定的架构?
- https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/Dataset.html
- https://docs.databricks.com/spark/latest/faq/append-a-row-to-rdd-or-dataframe.html
- How to create an empty DataFrame with a specified schema?
- https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/Dataset.html
- https://docs.databricks.com/spark/latest/faq/append-a-row-to-rdd-or-dataframe.html
这篇关于Scala-Spark:如何在循环中合并所有数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文