N次复制Spark Row [英] Replicate Spark Row N-times
问题描述
我想在DataFrame中复制一行,该怎么做?
I want to duplicate a Row in a DataFrame, how can I do that?
例如,我有一个由1行组成的DataFrame,并且我想制作一个具有100个相同行的DataFrame.我想出了以下解决方案:
For example, I have a DataFrame consisting of 1 Row, and I want to make a DataFrame with 100 identical Rows. I came up with the following solution:
var data:DataFrame=singleRowDF
for(i<-1 to 100-1) {
data = data.unionAll(singleRowDF)
}
但这引入了许多转换,而且看来我随后的动作非常缓慢.还有另一种方法吗?
But this introduces many transformations and it seems my subsequent actions become very slow. Is there another way to do it?
推荐答案
您可以添加一列,其字面值为Array,其大小为100,然后使用explode
使其每个元素创建自己的行;然后,只需删除此虚拟"列即可:
You can add a column with a literal value of an Array with size 100, and then use explode
to make each of its elements create its own row; Then, just get rid of this "dummy" column:
import org.apache.spark.sql.functions._
val result = singleRowDF
.withColumn("dummy", explode(array((1 until 100).map(lit): _*)))
.selectExpr(singleRowDF.columns: _*)
这篇关于N次复制Spark Row的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!