Hive on Spark:Missing< spark-assembly * .jar> [英] Hive on Spark: Missing <spark-assembly*.jar>
问题描述
我试着按照 Hive on Spark:Getting Started :
./ dev / make-distribution.sh - 名称hadoop2-without-hive--tgz
--Pyarn,hadoop-provided ,hadoop-2.7,parquet-provided
但是,我在spark目录下找不到任何spark-assembly jar文件( find。-namespark-assembly * .jar
不返回任何内容)。我没有将spark-assembly jar链接到 HIVE_HOME / lib
,而是尝试了 export SPARK_HOME = / home / user / spark
。
我在直线中得到以下Hive错误:
0 :jdbc:hive2:// localhost:10000>设置hive.execution.engine = spark;
0:jdbc:hive2:// localhost:10000>插入到测试(ID,名称)值(1,'test1');
错误:运行查询时出错:java.lang.NoClassDefFoundError:scala / collection / Iterable(state =,code = 0)
我认为这个错误是由于缺少火花组装罐引起的。
我该如何构建/我在哪里可以找到那些火花组装罐如何修复上述错误?
谢谢!$ b $首先,Spark不会从2.0.0开始构建 spark-assembly.jar
,但是构建所有的依赖关系jar到目录 $ SPARK_HOME / jars
pom.xml
文件中找到相应的Spark版本。对于 Hive 2.1.1
,在pom.xml中指定的spark版本是:
< spark .version> 1.6.0< /spark.version>
由于您已经知道您需要没有蜂巢支持的情况下构建火花我不知道为什么,但是中的命令
mvn -Pyarn -Phadoop-2.6 -Dscala-2.11 -DskipTests干净的软件包
(希望你不会见面):
- 启动Spark Master失败是因为找不到slf4f或hadoop相关的类,请运行
export SPARK_DIST_CLASSPATH = $(hadoop classpath)
然后再试一次
- 无法加载清晰的本机库,这是由于没有活泼的依赖在classpath中,或者hadoop classpath下的精灵库不是Spark的正确版本。您可以下载正确版本的snappy lib,并将其放在
$ SPARK_HOME / lib /
下,然后运行export SPARK_DIST_CLASSPATH = $ SPARK_HOME / lib / *: $(hadoop classpath)
然后再试一次。
- 无法加载清晰的本机库,这是由于没有活泼的依赖在classpath中,或者hadoop classpath下的精灵库不是Spark的正确版本。您可以下载正确版本的snappy lib,并将其放在
希望这会对您有所帮助,一切顺利。
I'm running Hive 2.1.1, Spark 2.1.0 and Hadoop 2.7.3.
I tried to build Spark following the Hive on Spark: Getting Started:
./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"
However, I couldn't find any spark-assembly jar files under the spark directory (find . -name "spark-assembly*.jar"
returns nothing back). Instead of linking the spark-assembly jar to HIVE_HOME/lib
, I tried export SPARK_HOME=/home/user/spark
.
I get the following Hive error in beeline:
0: jdbc:hive2://localhost:10000> set hive.execution.engine=spark;
0: jdbc:hive2://localhost:10000> insert into test (id, name) values (1, 'test1');
Error: Error running query: java.lang.NoClassDefFoundError: scala/collection/Iterable (state=,code=0)
I think the error is caused by missing spark-assembly jars.
How could I build / Where could I find those spark-assembly jar files?
How could I fix the above error?
Thank you!
First of all, Spark will not build spark-assembly.jar
from 2.0.0, but build all dependency jars to directory $SPARK_HOME/jars
Besides, Hive does not support every version of Spark, actually it has a strong version compatibility restrictions to run Hive on Spark. Depends on which version of Hive you're using, you can always find out the corresponding Spark version in pom.xml
file of Hive. For Hive 2.1.1
, the spark version specified in pom.xml is:
<spark.version>1.6.0</spark.version>
As you already know that you need to build spark without hive support. I don't know why but the command in Hive on Spark - Getting Started does not work for me, finally I succeeded with following command:
mvn -Pyarn -Phadoop-2.6 -Dscala-2.11 -DskipTests clean package
And few other troubleshooting tips which I met before(Hope you're not going to meet):
- Starting Spark Master failed due to failed to find slf4f or hadoop related classes, run
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
and try again - Failed to load snappy native libs, which is caused by that there's no snappy dependency in classpath, or the snappy lib under hadoop classpath is not the correct version for Spark. You can download a correct version of snappy lib and put it under
$SPARK_HOME/lib/
, and runexport SPARK_DIST_CLASSPATH=$SPARK_HOME/lib/*:$(hadoop classpath)
and try again.
Hope this could be helpful and everything goes well to you.
这篇关于Hive on Spark:Missing< spark-assembly * .jar>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!