使用collect_list和collect_set在SQL星火 [英] Use collect_list and collect_set in Spark SQL

查看：8588 发布时间：2016/5/22 16:01:39 apache-spark hive apache-spark-sql

本文介绍了使用collect_list和collect_set在SQL星火的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

根据本文档，在 collect_set 和 collect_list 的功能应该是星火SQL中使用。但是，我无法得到它的工作。我使用的是泊坞窗图片运行星火1.6.0。

According to the docs, the collect_set and collect_list functions should be available in Spark SQL. However, I cannot get it to work. I'm running Spark 1.6.0 using a Docker image.

我想为此在斯卡拉：

import org.apache.spark.sql.functions._ 

df.groupBy("column1") 
  .agg(collect_set("column2")) 
  .show()

和收到以下运行时错误：

And receive the following error at runtime:

Exception in thread "main" org.apache.spark.sql.AnalysisException: undefined function collect_set;

也试过它使用 pyspark ，但它也失败。该文档说明这些功能是蜂巢UDAFs的别名，但我想不出来启用这些功能。

Also tried it using pyspark, but it also fails. The docs state these functions are aliases of Hive UDAFs, but I can't figure out to enable these functions.

如何解决这一问题？感谢名单！

How to fix this? Thanx!

推荐答案

星火2.0 +

您必须启用蜂巢支持一个给定的 SparkSession ：

You have to enable Hive support for a given SparkSession:

在斯卡拉：

val spark = SparkSession.builder
  .master("local")
  .appName("testing")
  .enableHiveSupport()  // <- enable Hive support.
  .getOrCreate()

在Python的：

spark = (SparkSession.builder
    .enableHiveSupport()
    .getOrCreate())

星火＆LT; 2.0

要能够使用蜂巢的UDF你已经使用的Spark内置有蜂巢的支持（当你使用pre-生成的二进制文件似乎是这里的情况这已经覆盖）和初始化 SparkContext 使用 HiveContext 。

To be able to use Hive UDFs you have use Spark built with Hive support (this is already covered when you use pre-built binaries what seems to be the case here) and initialize SparkContext using HiveContext.

在斯卡拉：

import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.SQLContext

val sqlContext: SQLContext = new HiveContext(sc)

在Python的：

from pyspark.sql import HiveContext

sqlContext = HiveContext(sc)

这篇关于使用collect_list和collect_set在SQL星火的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用collect_list和collect_set在SQL星火 [英] Use collect_list and collect_set in Spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用collect_list和collect_set在SQL星火 [英] Use collect_list and collect_set in Spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭