使用PySpark将数据插入Hive时不支持的语言功能 [英] Unsupported language features using PySpark for inserting data into Hive
问题描述
我正在尝试针对Hive执行此SQL插入语句:
I am trying to execute this SQL insert statement against Hive:
insert into mydb.mytable (k, v) values (3, 'c'), (4, 'd')
如果我使用DBeaver,则此SQL语句有效.但是,当我使用PySpark
REPL时,会出现以下异常.
If I use DBeaver, this SQL statement works. When I use the PySpark
REPL, however, I get the following exception.
Traceback (most recent call last):
File "", line 1, in
File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/context.py", line 580, in sql
return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
File "/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/utils.py", line 51, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"\nUnsupported language features in query: insert into mydb.mytable (k, v) values (3, 'c'), (4, 'd')\nTOK_QUERY 0, 0,30, 0\n TOK_FROM 0, -1,30, 0\n TOK_VIRTUAL_TABLE 0, -1,30, 0\n TOK_VIRTUAL_TABREF 0, -1,-1, 0\n TOK_ANONYMOUS 0, -1,-1, 0\n TOK_VALUES_TABLE 1, 15,30, 39\n TOK_VALUE_ROW 1, 17,22, 39\n 3 1, 18,18, 39\n 'c' 1, 21,21, 42\n TOK_VALUE_ROW 1, 25,30, 49\n 4 1, 26,26, 49\n 'd' 1, 29,29, 52\n TOK_INSERT 1, 0,-1, 12\n TOK_INSERT_INTO 1, 0,13, 12\n TOK_TAB 1, 4,6, 12\n TOK_TABNAME 1, 4,6, 12\n jvang 1, 4,4, 12\n test1 1, 6,6, 18\n TOK_TABCOLNAME 1, 9,12, 25\n k 1, 9,9, 25\n v 1, 12,12, 28\n TOK_SELECT 0, -1,-1, 0\n TOK_SELEXPR 0, -1,-1, 0\n TOK_ALLCOLREF 0, -1,-1, 0\n\nscala.NotImplementedError: No parse rules for:\n TOK_VIRTUAL_TABLE 0, -1,30, 0\n TOK_VIRTUAL_TABREF 0, -1,-1, 0\n TOK_ANONYMOUS 0, -1,-1, 0\n TOK_VALUES_TABLE 1, 15,30, 39\n TOK_VALUE_ROW 1, 17,22, 39\n 3 1, 18,18, 39\n 'c' 1, 21,21, 42\n TOK_VALUE_ROW 1, 25,30, 49\n 4 1, 26,26, 49\n 'd' 1, 29,29, 52\n \norg.apache.spark.sql.hive.HiveQl$.nodeToRelation(HiveQl.scala:1362)\n ;"
我的代码很简单.
sql = "insert into mydb.mytable (k, v) values (3, 'c'), (4, 'd')"
sqlContext.sql(sql)
关于为什么发生这种情况的任何想法吗?有什么办法可以通过PySpark将行追加到现有的Hive表中?我已经看到了一些使用多个SQL插入语句的示例,但我认为这似乎并不高效.我实质上是想通过PySpark将批量导入(附加模式)到现有的Hive表中.
Any idea on why this is happening? Is there any way i can append rows into an existing Hive table via PySpark? I've seen some examples using multiple SQL insert statements, but I don't think that seems performant; I am essentially trying to do bulk import (append mode) into an existing Hive table via PySpark.
我正在使用
- Spark v1.6.1
- Python v2.7.12
- Hive v1.2.1000.
推荐答案
我也遇到了Spark 1.6的错误
I am also getting below error with spark 1.6
java.sql.SQLException:org.apache.spark.sql.AnalysisException: org.apache.spark.sql.hive.HiveQl $ .nodeToRelation(HiveQl.scala:1362)
java.sql.SQLException: org.apache.spark.sql.AnalysisException: org.apache.spark.sql.hive.HiveQl$.nodeToRelation(HiveQl.scala:1362)
搜索后,发现错误消失后,我发现spark 1.6使用了不同的语法.
After search I got that spark 1.6 is using different syntax, after using that error gone.
示例:
insert into table mytable1 select t.* from (select 'Shub',30) t;
来源:
这篇关于使用PySpark将数据插入Hive时不支持的语言功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!