Spark Sql - 插入外部 Hive 表错误 [英] Spark Sql - Insert Into External Hive Table Error

查看:25
本文介绍了Spark Sql - 插入外部 Hive 表错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过 spark sql 将数据插入到外部配置单元表中.我的蜂巢表是通过一列存储的.创建外部配置单元表的查询是这样的

I am trying to insert data into a external hive table through spark sql. My hive table is bucketed via a column. The query to create the external hive table is this

create external table tab1 ( col1 type,col2 type,col3 type) clustered by (col1,col2) sorted by (col1) into 8 buckets stored as parquet

现在我尝试将 Parquet 文件(存储在 hdfs 中)中的数据存储到表中.这是我的代码

Now I tried to store data from a parquet file (stored in hdfs) into the table. This is my code

    SparkSession session = SparkSession.builder().appName("ParquetReadWrite").
                    config("hive.exec.dynamic.partition", "true").
                    config("hive.exec.dynamic.partition.mode", "nonstrict").
                    config("hive.execution.engine","tez").
                    config("hive.exec.max.dynamic.partitions","400").
                    config("hive.exec.max.dynamic.partitions.pernode","400").
                    config("hive.enforce.bucketing","true").
                    config("optimize.sort.dynamic.partitionining","true").
                    config("hive.vectorized.execution.enabled","true").
                    config("hive.enforce.sorting","true").
                    enableHiveSupport()
                    .master(args[0]).getOrCreate();
String insertSql="insert into tab1 select * from"+"'"+parquetInput+"'";

session.sql(insertSql);

  1. 当我运行代码时,它抛出以下错误

  1. When I run the code , its throwing the below error

不匹配的输入 ''hdfs://url:port/user/clsadmin/somedata.parquet'' 期望(第 1 行,位置 50)

mismatched input ''hdfs://url:port/user/clsadmin/somedata.parquet'' expecting (line 1, pos 50)

== SQL ==插入 UK_DISTRICT_MONTH_DATA select * from 'hdfs://url:port/user/clsadmin/somedata.parquet'--------------------------------------------------^^^

== SQL == insert into UK_DISTRICT_MONTH_DATA select * from 'hdfs://url:port/user/clsadmin/somedata.parquet' --------------------------------------------------^^^

at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)

  • 使用 hive 执行引擎作为 Tez 和 Spark 有什么区别?

  • What is the difference between using the hive execution engine as Tez and Spark ?

    推荐答案

    在 Hive 中创建外部表,要指定 HDFS 位置.

    Creating external table in Hive, HDFS location to be specified.

    create external table tab1 ( col1 type,col2 type,col3 type) 
    clustered by (col1,col2) sorted by (col1) into 8 buckets 
    stored as parquet 
    LOCATION hdfs://url:port/user/clsadmin/tab1
    

    hive 没有必要填充数据,相同的应用程序或其他应用程序都可以将数据摄取到该位置,并且 hive 将通过定义该位置的架构顶部来访问数据.

    There won't be necessity that hive will populate the data, either same application or other application can ingest the data into the location and hive will access the data by defining the schema top of the location.

    *== SQL ==插入 UK_DISTRICT_MONTH_DATA select * from 'hdfs://url:port/user/clsadmin/somedata.parquet'--------------------------------------------------^^^*

    parquetInput 是 parquet HDFS 文件路径,而不是 Hive 表名.因此出现错误.

    parquetInput is parquet HDFS file path and not Hive table name. Hence the error.

    有两种方法可以解决此问题:

    There are two ways you can solve this issue:

    1. 为parquetInput"定义外部表并给出表姓名
    2. 使用LOAD DATA INPATH 'hdfs://url:port/user/clsadmin/somedata.parquet' INTO TABLE tab1

    这篇关于Spark Sql - 插入外部 Hive 表错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆