Spark上的Hive 2.1.1 - 我应该使用哪个版本的Spark [英] Hive 2.1.1 on Spark - Which version of Spark should I use

查看:801
本文介绍了Spark上的Hive 2.1.1 - 我应该使用哪个版本的Spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Ubuntu 16.04上运行 hive 2.1.1 ,hadoop 2.7.3。 根据 Hive on Spark:Getting Started ,它说


安装/构建兼容版本。 Hive root pom.xml的
定义了它构建/测试的
的Spark版本。


我检查过

 < spark.version> 1.6.0< / p> spark.version> 




在Spark 2.0之前:./make-distribution.sh - 名称
hadoop2-without-hive--tgz
--Pyarn,hadoop提供,hadoop- 2.4,parquet-provided



由于Spark
2.0.0:./dev/make-distribution.sh --namehadoop2-without-hive --tgz--Pyarn,hadoop提供,hadoop-2.7,实木复合地提供

所以现在我很困惑,因为我我正在运行hadoop 2.7.3。我必须将我的hadoop降级到2.4吗?



我应该使用哪个版本的Spark? 1.6.0或2.0.0?



谢谢!

解决方案

当前版本的Spark 2.X与Hive 2.1和Hadoop 2.7不兼容,存在一个主要的bug:


JavaSparkListener不可用,Hive执行崩溃



https: //issues.apache.org/jira/browse/SPARK-17563



您可以尝试使用Hadoop 2.7和Spark 1.6构建Hive 2.1:

  ./ make-distribution.sh  - 名称hadoop2-without-hive--tgz--Pyarn,hadoop-provided ,hadoop-2.7,parquet-provided

如果您在2.0之后查看命令,是./make-distribution在文件夹/ dev中。



如果它不适用于hadoop 2.7.X,我可以证实你已经能够使用Hadoop 2.6成功构建它:

  ./ mak e-distribution.sh --namehadoop2-without-hive--tgz-Pyarn,hadoop-hadoop-2.6,parquet-provided

和scala 2.10.5


I'm running hive 2.1.1, hadoop 2.7.3 on Ubuntu 16.04.

According to Hive on Spark: Getting Started , it says

Install/build a compatible version. Hive root pom.xml's defines what version of Spark it was built/tested with.

I checked the pom.xml, it shows that spark version is 1.6.0.

<spark.version>1.6.0</spark.version>

But Hive on Spark: Getting Started also says that

Prior to Spark 2.0.0: ./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"

Since Spark 2.0.0: ./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"

So now I'm confused because I am running hadoop 2.7.3. Do I have to downgrade my hadoop to 2.4?

Which version of Spark should I use? 1.6.0 or 2.0.0?

Thank you!

解决方案

The current version of Spark 2.X is not compatible with Hive 2.1 and Hadoop 2.7, there is a major bug:

JavaSparkListener is not available and Hive crash on execution

https://issues.apache.org/jira/browse/SPARK-17563

You can try to build Hive 2.1 with Hadoop 2.7 and Spark 1.6 with:

./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided" 

If you take a look to the command after 2.0 the difference is that ./make-distribution is inside the folder /dev.

If it does not work for hadoop 2.7.X, I can confirm you that I have been able to successfully built it with Hadoop 2.6, by using:

./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided" 

and for scala 2.10.5

这篇关于Spark上的Hive 2.1.1 - 我应该使用哪个版本的Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆