Spark 支持使用 Windows 函数 [英] Spark support for using Windows function

查看:24
本文介绍了Spark 支持使用 Windows 函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 spark 版本 1.6.0..而我在 python 中使用 spark.我发现我使用的 spark 版本不支持 windows 函数,就像我尝试在其中使用 windows 函数时一样我的查询(使用 sparksql)它给了我一个错误,因为你需要用 hive 功能构建 spark".随后我搜索了各种东西,发现我需要使用 spark 1.4.0 版,但我没有运气.一些帖子还建议使用 hive 功能构建 spark.但我没有找到正确的方法.
使用 spark 1.4.0 时出现以下错误.

I am using spark version 1.6.0..while I am using spark with python.I found that windows function are not been supported by the version of the spark that I am using,as when I tried to use windows function in my query(using sparksql) it gave me an error as 'you need to build spark with hive functionality'.Following that I searched various things and found that I need to use spark version 1.4.0.,which I did with no luck.Some posts also suggested to build spark with hive functionality.But I did not found the right way to do it.
when used spark 1.4.0.I got the following error.

raise ValueError("invalid mode %r (only r, w, b allowed)")
ValueError: invalid mode %r (only r, w, b allowed)
16/04/04 14:17:17 WARN PythonRDD: Incomplete task interrupted: Attempting to kil
l Python Worker
16/04/04 14:17:17 INFO HadoopRDD: Input split: file:/C:/Users/test
esktop/spark-1.4.0-bin-hadoop2.4/test:910178+910178
16/04/04 14:17:17 INFO Executor: Executor killed task 1.0 in stage 1.0 (TID 2)
16/04/04 14:17:17 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, localh
ost): TaskKilled (killed intentionally)
16/04/04 14:17:17 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have
all completed, from pool

推荐答案

我觉得这是我第三次回答类似问题了:

I think that this is the third time that I answer a similar question :

HiveContext 支持 Windows 函数,而不是常规 SQLContext.

Windows function are supported with HiveContext and not regular SQLContext.

关于如何用hive支持构建spark,答案在官方构建 Spark 文档 :

Concerning how to build spark with hive support, the answer is in the official Building Spark documentation :

使用 Hive 和 JDBC 支持构建要为 Spark SQL 及其 JDBC 服务器和 CLI 启用 Hive 集成,请将 -Phive 和 Phive-thriftserver 配置文件添加到现有构建选项.默认情况下,Spark 将使用 Hive 0.13.1 绑定构建.

Building with Hive and JDBC Support To enable Hive integration for Spark SQL along with its JDBC server and CLI, add the -Phive and Phive-thriftserver profiles to your existing build options. By default Spark will build with Hive 0.13.1 bindings.

mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package

为 Scala 2.11 构建

要生成使用 Scala 2.11 编译的 Spark 包,请使用 -Dscala-2.11 属性:

Building for Scala 2.11

To produce a Spark package compiled with Scala 2.11, use the -Dscala-2.11 property:

./dev/change-scala-version.sh 2.11
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package

这里有魔法,一切都在文档中.

There is magic here, everything is in the documentation.

这篇关于Spark 支持使用 Windows 函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆