使用Pyhive插入Hive会引发错误 [英] Insert Into Hive Using Pyhive invoke an error

查看:536
本文介绍了使用Pyhive插入Hive会引发错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用pyhive与蜂巢互动.

I am using pyhive to interact with hive.

使用以下代码, SELECT 语句运行良好.

The SELECT statement going well using this code bellow.

# Import hive module and connect
from pyhive import hive
conn = hive.Connection(host="HOST")
cur = conn.cursor()

# Import pandas
import pandas as pd

# Store select query in dataframe 
all_tables = pd.read_sql("SELECT * FROM table LIMIT 5", conn)
print all_tables

# Using curssor 
cur = conn.cursor()
cur.execute('SELECT * FROM table LIMIT 5')
print cursor.fetchall()

直到这里都没有问题.当我想 INSERT 进入配置单元时.

Until here there is no problem. When I want to INSERT into hive.

假设我要执行以下查询: INSERT INTO table2 SELECT Col1,Col2 FROM table1;

Let's say I want to excute this query : INSERT INTO table2 SELECT Col1, Col2 FROM table1;

我尝试过:

cur.execute('INSERT INTO table2 SELECT Col1, Col2 FROM table1')

我收到此错误

pyhive.exc.OperationalError: TExecuteStatementResp(status=TStatus(errorCode=1, errorMessage=u'Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask', sqlState=u'08S01', infoMessages=[u'*org.apache.hive.service.cli.HiveSQLException:Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask:28:27', u'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:388', u'org.apache.hive.service.cli.operation.SQLOperation:runQuery:SQLOperation.java:244', u'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:279', u'org.apache.hive.service.cli.operation.Operation:run:Operation.java:324', u'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:499', u'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:475', u'sun.reflect.GeneratedMethodAccessor81:invoke::-1', u'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', u'java.lang.reflect.Method:invoke:Method.java:498', u'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', u'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', u'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', u'java.security.AccessController:doPrivileged:AccessController.java:-2', u'javax.security.auth.Subject:doAs:Subject.java:422', u'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1698', u'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', u'com.sun.proxy.$Proxy33:executeStatement::-1', u'org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:270', u'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507', u'org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437', u'org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422', u'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', u'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', u'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', u'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', u'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', u'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', u'java.lang.Thread:run:Thread.java:748'], statusCode=3), operationHandle=None)

如果我直接在蜂巢中执行相同的查询,则一切运行良好.有什么想法吗?

If I excute the same query in hive directly everything run well. Any thoughts?

注意:我所有的桌子都是外部的

NB: All my tables are external

CREATE EXTERNAL TABLE IF NOT EXISTS table ( col1 String, col2 String) stored as orc LOCATION 's3://somewhere' tblproperties ("orc.compress"="SNAPPY");

推荐答案

解决方案是在连接行中添加用户名. conn = hive.Connection(host ="HOST",username ="USER")

The solution was to add the username in the connection line; conn = hive.Connection(host="HOST", username="USER")

据我了解,配置单元查询分为许多类型的操作(作业).在执行简单查询时(即SELECT * FROM table ),这将从配置单元metastore中读取数据,而无需执行查询的mapReduce作业或tmp表.但是,一旦切换到更复杂的查询(即使用JOIN),您最终就会遇到相同的错误.

From what I understand hive queries divided on many type of operations (jobs). While you are performing a simple query (ie. SELECT * FROM table) This reads data from the hive metastore no mapReduce job or tmp tables needed to perform the query. But as soon as you switch to more complicated queries (ie. using JOINs) you end up having the same error.

文件代码如下:

# Import hive module and connect
from pyhive import hive
conn = hive.Connection(host="HOST", username="USER")
cur = conn.cursor()
query = "INSERT INTO table2 SELECT Col1, Col2 FROM table1"
cur.execute(query)

所以也许它需要许可或其他什么.我将对此行为进行更多搜索并更新答案.

So maybe it needs permission or something.. I will search more about this behavior and update the answer.

这篇关于使用Pyhive插入Hive会引发错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆