指定列的Spark SQL问题 [英] Spark sql issue with columns specified

查看：504 发布时间：2020/9/4 19:55:34 sql apache-spark apache-spark-sql apache-spark-2.0

本文介绍了指定列的Spark SQL问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们正在尝试将oracle数据库复制到配置单元中.我们从oracle获取查询，并在配置单元中运行它们. 因此，我们以以下格式获取它们:

we are trying to replicate an oracle db into hive. We get the queries from oracle and run them in hive. So, we get them in this format:

INSERT INTO schema.table(col1,col2) VALUES ('val','val');

此查询直接在Hive中运行时，当我使用spark.sql时，出现以下错误:

While this query works in Hive directly, when I use spark.sql, I get the following error:

org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'emp_id' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 20)
== SQL ==
insert into ss.tab(emp_id,firstname,lastname) values ('1','demo','demo')
--------------------^^^
        at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)
        at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)
        at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
        at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
        at com.datastream.SparkReplicator.insertIntoHive(SparkReplicator.java:20)
        at com.datastream.App.main(App.java:67)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

在这里，我通过pyspark使用spark sql插入记录

df = spark.sql("""insert into UDB.emp_details_table values ('6','VVV','IT','2018-12-18')""");

您可以在下面看到给定的记录已插入到我现有的配置单元表中.

you could see below that given record has been inserted to my existing hive table.

+---------+-----------+-----------+-------------------+--+
| emp_id  | emp_name  | emp_dept  | emp_joining_date  |
+---------+-----------+-----------+-------------------+--+
| 1       | AAA       | HR        | 2018-12-06        |
| 1       | BBB       | HR        | 2017-10-26        |
| 2       | XXX       | ADMIN     | 2018-10-22        |
| 2       | YYY       | ADMIN     | 2015-10-19        |
| 2       | ZZZ       | IT        | 2018-05-14        |
| 3       | GGG       | HR        | 2018-06-30        |
| 6       | VVV       | IT        | 2018-12-18        |
+---------+-----------+-----------+-------------------+--+

将您的spark sql查询更改为:

change your spark sql query as :

spark.sql("""insert into ss.tab values ('1','demo','demo')""");

注意:我使用的是spark 2.3，您需要使用蜂巢上下文，以防万一使用的是Spark 1.6版本.

Note: I am using spark 2.3, you need to use hive context in case you are using spark 1.6 version.

让我知道它是否有效.

这篇关于指定列的Spark SQL问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

指定列的Spark SQL问题 [英] Spark sql issue with columns specified

问题描述

推荐答案

在这里，我通过pyspark使用spark sql插入记录

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

指定列的Spark SQL问题 [英] Spark sql issue with columns specified

问题描述

推荐答案

在这里，我通过pyspark使用spark sql插入记录

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭