使用Pyhive插入Hive会引发错误 [英] Insert Into Hive Using Pyhive invoke an error
问题描述
我正在使用pyhive与蜂巢互动.
I am using pyhive to interact with hive.
使用以下代码, SELECT
语句运行良好.
The SELECT
statement going well using this code bellow.
# Import hive module and connect
from pyhive import hive
conn = hive.Connection(host="HOST")
cur = conn.cursor()
# Import pandas
import pandas as pd
# Store select query in dataframe
all_tables = pd.read_sql("SELECT * FROM table LIMIT 5", conn)
print all_tables
# Using curssor
cur = conn.cursor()
cur.execute('SELECT * FROM table LIMIT 5')
print cursor.fetchall()
直到这里都没有问题.当我想 INSERT
进入配置单元时.
Until here there is no problem. When I want to INSERT
into hive.
假设我要执行以下查询: INSERT INTO table2 SELECT Col1,Col2 FROM table1;
Let's say I want to excute this query : INSERT INTO table2 SELECT Col1, Col2 FROM table1;
我尝试过:
cur.execute('INSERT INTO table2 SELECT Col1, Col2 FROM table1')
我收到此错误
pyhive.exc.OperationalError: TExecuteStatementResp(status=TStatus(errorCode=1, errorMessage=u'Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask', sqlState=u'08S01', infoMessages=[u'*org.apache.hive.service.cli.HiveSQLException:Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask:28:27', u'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:388', u'org.apache.hive.service.cli.operation.SQLOperation:runQuery:SQLOperation.java:244', u'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:279', u'org.apache.hive.service.cli.operation.Operation:run:Operation.java:324', u'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:499', u'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:475', u'sun.reflect.GeneratedMethodAccessor81:invoke::-1', u'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', u'java.lang.reflect.Method:invoke:Method.java:498', u'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', u'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', u'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', u'java.security.AccessController:doPrivileged:AccessController.java:-2', u'javax.security.auth.Subject:doAs:Subject.java:422', u'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1698', u'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', u'com.sun.proxy.$Proxy33:executeStatement::-1', u'org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:270', u'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507', u'org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437', u'org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422', u'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', u'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', u'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', u'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', u'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', u'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', u'java.lang.Thread:run:Thread.java:748'], statusCode=3), operationHandle=None)
如果我直接在蜂巢中执行相同的查询,则一切运行良好.有什么想法吗?
If I excute the same query in hive directly everything run well. Any thoughts?
注意:我所有的桌子都是外部的
NB: All my tables are external
CREATE EXTERNAL TABLE IF NOT EXISTS table ( col1 String, col2 String) stored as orc LOCATION 's3://somewhere' tblproperties ("orc.compress"="SNAPPY");
推荐答案
解决方案是在连接行中添加用户名. conn = hive.Connection(host ="HOST",username ="USER")
The solution was to add the username in the connection line; conn = hive.Connection(host="HOST", username="USER")
据我了解,配置单元查询分为许多类型的操作(作业).在执行简单查询时(即SELECT * FROM table
),这将从配置单元metastore中读取数据,而无需执行查询的mapReduce作业或tmp表.但是,一旦切换到更复杂的查询(即使用JOIN),您最终就会遇到相同的错误.
From what I understand hive queries divided on many type of operations (jobs). While you are performing a simple query (ie. SELECT * FROM table
) This reads data from the hive metastore no mapReduce job or tmp tables needed to perform the query. But as soon as you switch to more complicated queries (ie. using JOINs) you end up having the same error.
文件代码如下:
# Import hive module and connect
from pyhive import hive
conn = hive.Connection(host="HOST", username="USER")
cur = conn.cursor()
query = "INSERT INTO table2 SELECT Col1, Col2 FROM table1"
cur.execute(query)
所以也许它需要许可或其他什么.我将对此行为进行更多搜索并更新答案.
So maybe it needs permission or something.. I will search more about this behavior and update the answer.
这篇关于使用Pyhive插入Hive会引发错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!