SparkSQL并在Java中的DataFrame上爆炸 [英] SparkSQL and explode on DataFrame in Java
问题描述
是否有一种简单的方法如何在SparkSQL DataFrame
的数组列上使用explode
?它在Scala中相对简单,但是在Java中似乎无法使用此功能(如javadoc中所述).
Is there an easy way how use explode
on array column on SparkSQL DataFrame
? It's relatively simple in Scala, but this function seems to be unavailable (as mentioned in javadoc) in Java.
一种选择是在查询中使用SQLContext.sql(...)
和explode
函数,但我正在寻找一种更好,尤其是更简洁的方法. DataFrame
是从镶木地板文件中加载的.
An option is to use SQLContext.sql(...)
and explode
function inside the query, but I'm looking for a bit better and especially cleaner way. DataFrame
s are loaded from parquet files.
推荐答案
似乎可以使用org.apache.spark.sql.functions.explode(Column col)
和DataFrame.withColumn(String colName, Column col)
的组合来用其分解版本替换该列.
It seems it is possible to use a combination of org.apache.spark.sql.functions.explode(Column col)
and DataFrame.withColumn(String colName, Column col)
to replace the column with the exploded version of it.
这篇关于SparkSQL并在Java中的DataFrame上爆炸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!