Spark Hive 报告 pyspark.sql.utils.AnalysisException: u'Table not found: XXX' 在纱线集群上运行时 [英] Spark Hive reporting pyspark.sql.utils.AnalysisException: u'Table not found: XXX' when run on yarn cluster

查看:22
本文介绍了Spark Hive 报告 pyspark.sql.utils.AnalysisException: u'Table not found: XXX' 在纱线集群上运行时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 BigInsights on Cloud 4.2 Enterprise 上运行一个访问 Hive 表的 pyspark 脚本.

I'm attempting to run a pyspark script on BigInsights on Cloud 4.2 Enterprise that accesses a Hive table.

首先我创建 hive 表:

First I create the hive table:

[biadmin@bi4c-xxxxx-mastermanager ~]$ hive
hive> CREATE TABLE pokes (foo INT, bar STRING);
OK
Time taken: 2.147 seconds
hive> LOAD DATA LOCAL INPATH '/usr/iop/4.2.0.0/hive/doc/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
Loading data to table default.pokes
Table default.pokes stats: [numFiles=1, numRows=0, totalSize=5812, rawDataSize=0]
OK
Time taken: 0.49 seconds
hive> 

然后我创建一个简单的pyspark脚本:

Then I create a simple pyspark script:

[biadmin@bi4c-xxxxxx-mastermanager ~]$ cat test_pokes.py
from pyspark import SparkContext

sc = SparkContext()

from pyspark.sql import HiveContext
hc = HiveContext(sc)

pokesRdd = hc.sql('select * from pokes')
print( pokesRdd.collect() )

我尝试执行:

[biadmin@bi4c-xxxxxx-mastermanager ~]$ spark-submit 
    --master yarn-cluster 
    --deploy-mode cluster 
    --jars /usr/iop/4.2.0.0/hive/lib/datanucleus-api-jdo-3.2.6.jar, 
           /usr/iop/4.2.0.0/hive/lib/datanucleus-core-3.2.10.jar, 
           /usr/iop/4.2.0.0/hive/lib/datanucleus-rdbms-3.2.9.jar 
    test_pokes.py

但是,我遇到了错误:

Traceback (most recent call last):
  File "test_pokes.py", line 8, in <module>
    pokesRdd = hc.sql('select * from pokes')
  File "/disk6/local/usercache/biadmin/appcache/application_1477084339086_0481/container_e09_1477084339086_0481_01_000001/pyspark.zip/pyspark/sql/context.py", line 580, in sql
  File "/disk6/local/usercache/biadmin/appcache/application_1477084339086_0481/container_e09_1477084339086_0481_01_000001/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/disk6/local/usercache/biadmin/appcache/application_1477084339086_0481/container_e09_1477084339086_0481_01_000001/pyspark.zip/pyspark/sql/utils.py", line 51, in deco
pyspark.sql.utils.AnalysisException: u'Table not found: pokes; line 1 pos 14'
End of LogType:stdout

如果我独立运行 spark-submit,我可以看到表存在:

If I run spark-submit standalone, I can see the table exists ok:

[biadmin@bi4c-xxxxxx-mastermanager ~]$ spark-submit test_pokes.py
…
…
16/12/21 13:09:13 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 18962 bytes result sent to driver
16/12/21 13:09:13 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 168 ms on localhost (1/1)
16/12/21 13:09:13 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/12/21 13:09:13 INFO DAGScheduler: ResultStage 0 (collect at /home/biadmin/test_pokes.py:9) finished in 0.179 s
16/12/21 13:09:13 INFO DAGScheduler: Job 0 finished: collect at /home/biadmin/test_pokes.py:9, took 0.236558 s
[Row(foo=238, bar=u'val_238'), Row(foo=86, bar=u'val_86'), Row(foo=311, bar=u'val_311')
…
…

请参阅我之前与此问题相关的问题:hive spark yarn-cluster 作业失败:ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory"

See my previous question related to this issue: hive spark yarn-cluster job fails with: "ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory"

这个问题类似于另一个问题:Spark 可以从 pyspark 访问 Hive 表,但不能从 spark-submit 访问.但是,与那个问题不同的是,我使用的是 HiveContext.

This question is similar to this other question: Spark can access Hive table from pyspark but not from spark-submit. However, unlike that question I am using HiveContext.

更新:最终解决方案见这里https://stackoverflow.com/a/41272260/1033422>

Update: see here for the final solution https://stackoverflow.com/a/41272260/1033422

推荐答案

这是因为spark-submit作业找不到hive-site.xml,所以无法连接到Hive元存储.请将 --files/usr/iop/4.2.0.0/hive/conf/hive-site.xml 添加到您的 spark-submit 命令.

This is because the spark-submit job is unable to find the hive-site.xml, so it cannot connect to the Hive metastore. Please add --files /usr/iop/4.2.0.0/hive/conf/hive-site.xml to your spark-submit command.

这篇关于Spark Hive 报告 pyspark.sql.utils.AnalysisException: u'Table not found: XXX' 在纱线集群上运行时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆