Spark SQL-插入外部Hive表错误 [英] Spark Sql - Insert Into External Hive Table Error

查看:446
本文介绍了Spark SQL-插入外部Hive表错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过spark sql将数据插入到外部配置单元表中. 我的蜂巢表通过一列存储. 创建外部配置单元表的查询是这个

I am trying to insert data into a external hive table through spark sql. My hive table is bucketed via a column. The query to create the external hive table is this

create external table tab1 ( col1 type,col2 type,col3 type) clustered by (col1,col2) sorted by (col1) into 8 buckets stored as parquet

现在,我尝试将数据(存储在hdfs中)的实木复合地板文件中的数据存储到表中. 这是我的代码

Now I tried to store data from a parquet file (stored in hdfs) into the table. This is my code

    SparkSession session = SparkSession.builder().appName("ParquetReadWrite").
                    config("hive.exec.dynamic.partition", "true").
                    config("hive.exec.dynamic.partition.mode", "nonstrict").
                    config("hive.execution.engine","tez").
                    config("hive.exec.max.dynamic.partitions","400").
                    config("hive.exec.max.dynamic.partitions.pernode","400").
                    config("hive.enforce.bucketing","true").
                    config("optimize.sort.dynamic.partitionining","true").
                    config("hive.vectorized.execution.enabled","true").
                    config("hive.enforce.sorting","true").
                    enableHiveSupport()
                    .master(args[0]).getOrCreate();
String insertSql="insert into tab1 select * from"+"'"+parquetInput+"'";

session.sql(insertSql);

  1. 当我运行代码时,它抛出以下错误

  1. When I run the code , its throwing the below error

预期的输入"hdfs://url:port/user/clsadmin/somedata.parquet"不匹配(第1行,pos 50)

mismatched input ''hdfs://url:port/user/clsadmin/somedata.parquet'' expecting (line 1, pos 50)

== SQL == 插入UK_DISTRICT_MONTH_DATA中,选择* from'hdfs://url:port/user/clsadmin/somedata.parquet' -------------------------------------------------- ^^^

== SQL == insert into UK_DISTRICT_MONTH_DATA select * from 'hdfs://url:port/user/clsadmin/somedata.parquet' --------------------------------------------------^^^

at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)

  • 将配置单元执行引擎用作Tez和Spark有什么区别?

  • What is the difference between using the hive execution engine as Tez and Spark ?

    推荐答案

    在Hive中创建要指定的HDFS位置的外部表.

    Creating external table in Hive, HDFS location to be specified.

    create external table tab1 ( col1 type,col2 type,col3 type) 
    clustered by (col1,col2) sorted by (col1) into 8 buckets 
    stored as parquet 
    LOCATION hdfs://url:port/user/clsadmin/tab1
    

    hive不必填充数据,同一应用程序或其他应用程序都可以将数据提取到该位置,并且hive通过定义位置的架构顶部来访问数据.

    There won't be necessity that hive will populate the data, either same application or other application can ingest the data into the location and hive will access the data by defining the schema top of the location.

    * == SQL == 插入UK_DISTRICT_MONTH_DATA中,选择* from'hdfs://url:port/user/clsadmin/somedata.parquet' -------------------------------------------------- ^^^ *

    *== SQL == insert into UK_DISTRICT_MONTH_DATA select * from 'hdfs://url:port/user/clsadmin/somedata.parquet' --------------------------------------------------^^^*

    parquetInput是实木复合地板HDFS文件路径,而不是Hive表名称.因此是错误.

    parquetInput is parquet HDFS file path and not Hive table name. Hence the error.

    有两种方法可以解决此问题:

    There are two ways you can solve this issue:

    1. 为"parquetInput"定义外部表并提供该表 名称
    2. 使用LOAD DATA INPATH 'hdfs://url:port/user/clsadmin/somedata.parquet' INTO TABLE tab1
    1. Define the external table for "parquetInput" and give the table name
    2. Use LOAD DATA INPATH 'hdfs://url:port/user/clsadmin/somedata.parquet' INTO TABLE tab1

    这篇关于Spark SQL-插入外部Hive表错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆