sqoop创建impala实木复合地板表 [英] sqoop create impala parquet table

查看:294
本文介绍了sqoop创建impala实木复合地板表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个相对较新的静悄悄的过程,所以请原谅任何无知.我一直在尝试从数据源中提取一张表作为木地板文件,并创建一个Impala表(也作为木地板),将经插入的数据插入其中.该代码运行没有问题,但是当我尝试选择几行进行测试时,出现错误:

I'm relatively new the process of sqooping so pardon any ignorance. I have been trying to sqoop a table from a data source as a parquet file and create an impala table (also as parquet) into which I will insert the sqooped data. The code runs without an issue, but when I try to select a couple rows for testing I get the error:

.../EWT_CALL_PROF_DIM_SQOOP/ec2fe2b0-c9fa-4ef9-91f8-46cf0e12e272.parquet' has an incompatible Parquet schema for column 'dru_id.test_ewt_call_prof_dim_parquet.call_prof_sk_id'. Column type: INT, Parquet schema: optional byte_array CALL_PROF_SK_ID [i:0 d:1 r:0]

我正在镜像我在cloudera指南上找到的过程:

I was mirroring the process I found on a cloudera guide here:https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_create_table.html. Mainly the "Internal and External Tables" section. I've been trying to avoid having to infer the schema with a particular parquet file, since this whole thing will be kicked off every month with a bash script (and I also can't think of a way to point it to just one file if I use more than one mapper).

这是我使用的代码.我觉得我要么错过了一些小而愚蠢的东西,要么就把所有主要问题搞砸了,却没有意识到.任何和所有帮助表示赞赏.谢谢!

Here's the code I used. I feel like I'm either missing something small and stupid, or I've screwed up everything major without realizing it. Any and all help appreciated. thanks!

    sqoop import -Doraoop.import.hint=" " \
    --options-file /home/kemri/pass.txt \
    --verbose \
    --connect jdbc:oracle:thin:@ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/EWSOP000 \
    --username [userid] \
    --num-mappers 1 \
    --target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_userid]/EWT_CALL_PROF_DIM_SQOOP \
    --delete-target-dir \
    --table DMPROD.EWT_CALL_PROF_DIM \
    --direct \
    --null-string '\\N' \
    --null-non-string '\\N' \
    --as-parquetfile 


impala-shell -k -i hrtimpslb.[employer].com


create external table test_EWT_CALL_PROF_DIM_parquet(
CALL_PROF_SK_ID INT,
SRC_SKL_CD_ID STRING,
SPLIT_NM STRING,
SPLIT_DESC STRING,
CLM_SYS_CD STRING,
CLM_SYS_NM STRING,
LOB_CD STRING,
LOB_NM STRING,
CAT_IND STRING,
CALL_TY_CD STRING,
CALL_TY_NM STRING,
CALL_DIR_CD STRING,
CALL_DIR_NM STRING,
LANG_CD STRING,
LANG_NM STRING,
K71_ATOMIC_TS TIMESTAMP)
stored as parquet location '/data/res/warehouse/finance/[dru_userid]/EWT_CALL_PROF_DIM_SQOOP';

推荐答案

根据注释中的请求,我提供了一个示例,说明如何使用带--hive-import的一个sqoop导入来实现相同的目的.出于明显的原因,我没有针对您的特定要求对其进行测试,因此可能需要更多调整,而这些sqoop命令通常是这种情况. 根据我的经验,导入为实木复合地板会迫使您使用--query选项,因为它不允许您将schema.table用作表.

As per request in the comments I provide an example of how you could achieve the same using one sqoop import with --hive-import. For obvious reasons I haven't tested it for your specific requirements, so it could need some more tuning which is often the case with these sqoop commands. In my experience importing as parquet forces you to use the --query option since it doesn't allow you to use schema.table as table.

sqoop import -Doraoop.import.hint=" "\
--verbose \
--connect jdbc:oracle:thin:@ldap://oid:389/cn=OracleContext,dc=[employer],dc=com/EWSOP000 \
--username [userid] \
-m 1 \
--password [ifNecessary] \
--hive-import \
--query 'SELECT * FROM DMPROD.EWT_CALL_PROF_DIM WHERE $CONDITIONS' \
--hive-database [database you want to use] \
--hive-table test_EWT_CALL_PROF_DIM_parquet \
--target-dir hdfs://nameservice1/data/res/warehouse/finance/[dru_userid]/EWT_CALL_PROF_DIM_SQOOP \
--null-string '\\N' \
--null-non-string '\\N' \
--as-parquetfile

--hive-import基本上需要的是--hive-database--hive-table--query. 如果您不希望所有列都以字符串形式出现在Hive中,则还必须包括:

Basically what you need for --hive-import is --hive-database, --hive-table and --query. If you don't want all your columns to appear in Hive as strings you must also include:

--map-hive-columns [column_name1=Timestamp,column_name2=Int,...]

您可能也需要类似的--map-java-columns,但是我不确定何时需要. 如果需要多个映射器,则需要--split-by

You might need a similar --map-java-columns as well, but I'm never sure when this is required. You will need a --split-by if you want multiple mappers

如评论中所述,您将需要使用invalidate metadata db.table确保Impala可以看到这些更改.您可以从CL发出两个命令,也可以发出一个bash脚本,在其中可以使用impala-shell -q [query]发出impala命令.

As discussed in the comments you will need to use invalidate metadata db.table to make sure Impala sees these changes. You could issue both commands from CL or a single bash-script where you can issue the impala command using impala-shell -q [query].

这篇关于sqoop创建impala实木复合地板表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆