SparkSQL 并在 Java 中的 DataFrame 上爆炸 [英] SparkSQL and explode on DataFrame in Java
问题描述
是否有一种简单的方法如何在 SparkSQL DataFrame
上的数组列上使用 explode
?在Scala中比较简单,但是这个功能在Java中似乎不可用(如javadoc中所述).
Is there an easy way how use explode
on array column on SparkSQL DataFrame
? It's relatively simple in Scala, but this function seems to be unavailable (as mentioned in javadoc) in Java.
一个选项是在查询中使用 SQLContext.sql(...)
和 explode
函数,但我正在寻找更好、特别更简洁的方法.DataFrame
从镶木地板文件加载.
An option is to use SQLContext.sql(...)
and explode
function inside the query, but I'm looking for a bit better and especially cleaner way. DataFrame
s are loaded from parquet files.
推荐答案
似乎可以结合使用 org.apache.spark.sql.functions.explode(Column col)
和DataFrame.withColumn(String colName, Column col)
用分解后的列替换列.
It seems it is possible to use a combination of org.apache.spark.sql.functions.explode(Column col)
and DataFrame.withColumn(String colName, Column col)
to replace the column with the exploded version of it.
这篇关于SparkSQL 并在 Java 中的 DataFrame 上爆炸的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!