使用sqoop导入，如何将行添加到现有的配置表格中？ [英] Using sqoop import, How to append rows into existing hive table?

查看：201 发布时间：2018/5/31 20:23:16 hadoop hive sqoop

本文介绍了使用sqoop导入，如何将行添加到现有的配置表格中？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在sqlserver中，我使用下面的查询导入并创建了一个配置表。

  sqoop import --connect'jdbc：sqlserver ：//10.1.1.12; database = testdb'--username uname --password paswd --table demotable --hive-import --hive-table hivedb.demotable --create-hive-table --fields-terminated-by '，'

命令成功，导入数据并创建一个包含10000条记录的表。

我在sqlserver中插入了10条新记录，并尝试使用--where子句将这10条记录追加到现有的配置表格中
sqoop import --connect'jdbc：sqlserver：//10.1.1.12; database = testdb'--username uname --password paswd --table demotable --whereID> 10000 --hive-import -hive-table hivedb.demotable
但sqoop作业失败并出错

$ b

错误tool.ImportTool：导入过程中出错：导入作业失败！

我哪里错了？使用sqoop插入到表中的任何其他选择。

编辑：
稍微更改上面的命令后，我能够追加新行。
sqoop import --connect'jdbc：sqlserver：//10.1.1.12; database = testdb'--username uname --password paswd --table demotable - whereID> 10000--hive-import -hive-table hivedb.demotable --fields-terminated-by'，'-m 1
虽然解决了上述问题，但我不能插入修改过的行。有没有办法插入修改的行而不使用
- 增量lastmodified 参数。
为了将行附加到配置单元表，使用之前使用过的相同查询，只需删除--hive-overwrite。

我将分享我曾经在配置单元中导入2个查询，一个用于覆盖，另一个用于附加，您可以使用相同的方法导入：

覆盖以前的记录
sqoop import -Dmapreduce.job.queuename = default --connect jdbc：teradata：// database_connection_string / DATABASE = database_name，TMODE = ANSI，LOGMECH = LDAP --username z ****** --password ******* --queryselect * from ****** where where $'CONDITIONS--split-byHASHBUCKET（HASHROW （键分割））MOD 4--num-mappers 4 --hive-table hive_table_name --boundary-query从dbc.dbcinfo中选择0,3--target-dir directory_name --delete-target-dir - -hive-import --hive-overwrite --drive r com.teradata.jdbc.TeraDriver
将附加到以前的记录

sqoop import -Dmapreduce.job.queuename = default --connect jdbc：teradata：// connection_string / DATABASE = db_name，TMODE = ANSI，LOGMECH = LDAP - -username ****** --password ****** - queryselect * from **** where \ $ CONDITIONS--split-byHASHBUCKET（HASHROW（key to split））MOD 4--num-mappers 4 --hive-import --hive-table guestblock.prodrptgstrgtn --boundary-query从dbc.dbcinfo中选择0,3--target-dir directory_name --delete-target-dir - -driver com.teradata.jdbc.TeraDriver
请注意，我使用4个映射器，您可以使用更多还有。
From sqlserver I imported and created a hive table using the below query.
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --hive-import --hive-table hivedb.demotable --create-hive-table --fields-terminated-by ','
Command was successful, imported the data and created a table with 10000 records.

I inserted 10 new records in sqlserver and tried to append these 10 records into existing hive table using --where clause
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --where "ID > 10000" --hive-import -hive-table hivedb.demotable
But the sqoop job is getting failed with error

ERROR tool.ImportTool: Error during import: Import job failed!

Where am I going wrong? any other alternatives to insert into table using sqoop.

EDIT: After slightly changing the above command I am able to append the new rows.
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --where "ID > 10000" --hive-import -hive-table hivedb.demotable --fields-terminated-by ',' -m 1
Though it resolve the mentioned problem, I cant insert the modified rows. Is there any way to insert the modified rows without using --incremental lastmodified parameter.
解决方案
in order to append rows to hive table, use the same query you have been using before, just remove the --hive-overwrite.

I will share the 2 queries that I used to import in hive, one for overwriting and one for append, you can use the same for importing:

To OVERWRITE the previous records
sqoop import -Dmapreduce.job.queuename=default --connect jdbc:teradata://database_connection_string/DATABASE=database_name,TMODE=ANSI,LOGMECH=LDAP --username z****** --password ******* --query "select * from ****** where \$CONDITIONS" --split-by "HASHBUCKET(HASHROW(key to split)) MOD 4" --num-mappers 4 --hive-table hive_table_name --boundary-query "select 0, 3 from dbc.dbcinfo" --target-dir directory_name --delete-target-dir --hive-import --hive-overwrite --driver com.teradata.jdbc.TeraDriver
TO APPEND to the previous records
sqoop import -Dmapreduce.job.queuename=default --connect jdbc:teradata://connection_string/DATABASE=db_name,TMODE=ANSI,LOGMECH=LDAP --username ****** --password ******--query "select * from **** where \$CONDITIONS" --split-by "HASHBUCKET(HASHROW(key to split)) MOD 4" --num-mappers 4 --hive-import --hive-table guestblock.prodrptgstrgtn --boundary-query "select 0, 3 from dbc.dbcinfo" --target-dir directory_name --delete-target-dir --driver com.teradata.jdbc.TeraDriver
Note that I am using 4 mappers, you can use more as well.

这篇关于使用sqoop导入，如何将行添加到现有的配置表格中？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用sqoop导入，如何将行添加到现有的配置表格中？ [英] Using sqoop import, How to append rows into existing hive table?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

使用sqoop导入，如何将行添加到现有的配置表格中？ [英] Using sqoop import, How to append rows into existing hive table?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭