使用sqoop导入,如何将行添加到现有的配置表格中? [英] Using sqoop import, How to append rows into existing hive table?
问题描述
在sqlserver中,我使用下面的查询导入并创建了一个配置表。
sqoop import --connect'jdbc:sqlserver ://10.1.1.12; database = testdb'--username uname --password paswd --table demotable --hive-import --hive-table hivedb.demotable --create-hive-table --fields-terminated-by ','
命令成功,导入数据并创建一个包含10000条记录的表。
我在sqlserver中插入了10条新记录,并尝试使用--where子句将这10条记录追加到现有的配置表格中
sqoop import --connect'jdbc:sqlserver://10.1.1.12; database = testdb'--username uname --password paswd --table demotable --whereID> 10000 --hive-import -hive-table hivedb.demotable
但sqoop作业失败并出错
$ b
错误tool.ImportTool:导入过程中出错:导入作业失败!
我哪里错了?使用sqoop插入到表中的任何其他选择。
编辑:
稍微更改上面的命令后,我能够追加新行。sqoop import --connect'jdbc:sqlserver://10.1.1.12; database = testdb'--username uname --password paswd --table demotable - whereID> 10000--hive-import -hive-table hivedb.demotable --fields-terminated-by','-m 1
虽然解决了上述问题,但我不能插入修改过的行。有没有办法插入修改的行而不使用
- 增量lastmodified 参数。为了将行附加到配置单元表,使用之前使用过的相同查询,只需删除--hive-overwrite。
我将分享我曾经在配置单元中导入2个查询,一个用于覆盖,另一个用于附加,您可以使用相同的方法导入:
覆盖以前的记录
sqoop import -Dmapreduce.job.queuename = default --connect jdbc:teradata:// database_connection_string / DATABASE = database_name,TMODE = ANSI,LOGMECH = LDAP --username z ****** --password ******* --queryselect * from ****** where where $'CONDITIONS--split-byHASHBUCKET(HASHROW (键分割))MOD 4--num-mappers 4 --hive-table hive_table_name --boundary-query从dbc.dbcinfo中选择0,3--target-dir directory_name --delete-target-dir - -hive-import --hive-overwrite --drive r com.teradata.jdbc.TeraDriver
将附加到以前的记录
sqoop import -Dmapreduce.job.queuename = default --connect jdbc:teradata:// connection_string / DATABASE = db_name,TMODE = ANSI,LOGMECH = LDAP - -username ****** --password ****** - queryselect * from **** where \ $ CONDITIONS--split-byHASHBUCKET(HASHROW(key to split))MOD 4--num-mappers 4 --hive-import --hive-table guestblock.prodrptgstrgtn --boundary-query从dbc.dbcinfo中选择0,3--target-dir directory_name --delete-target-dir - -driver com.teradata.jdbc.TeraDriver
请注意,我使用4个映射器,您可以使用更多还有。
From sqlserver I imported and created a hive table using the below query.
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --hive-import --hive-table hivedb.demotable --create-hive-table --fields-terminated-by ','
Command was successful, imported the data and created a table with 10000 records.
I inserted 10 new records in sqlserver and tried to append these 10 records into existing hive table using --where clause
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --where "ID > 10000" --hive-import -hive-table hivedb.demotable
But the sqoop job is getting failed with error
ERROR tool.ImportTool: Error during import: Import job failed!
Where am I going wrong? any other alternatives to insert into table using sqoop.
EDIT: After slightly changing the above command I am able to append the new rows.
sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --where "ID > 10000" --hive-import -hive-table hivedb.demotable --fields-terminated-by ',' -m 1
Though it resolve the mentioned problem, I cant insert the modified rows. Is there any way to insert the modified rows without using --incremental lastmodified parameter.
解决方案in order to append rows to hive table, use the same query you have been using before, just remove the --hive-overwrite.
I will share the 2 queries that I used to import in hive, one for overwriting and one for append, you can use the same for importing:
To OVERWRITE the previous records
sqoop import -Dmapreduce.job.queuename=default --connect jdbc:teradata://database_connection_string/DATABASE=database_name,TMODE=ANSI,LOGMECH=LDAP --username z****** --password ******* --query "select * from ****** where \$CONDITIONS" --split-by "HASHBUCKET(HASHROW(key to split)) MOD 4" --num-mappers 4 --hive-table hive_table_name --boundary-query "select 0, 3 from dbc.dbcinfo" --target-dir directory_name --delete-target-dir --hive-import --hive-overwrite --driver com.teradata.jdbc.TeraDriver
TO APPEND to the previous records
sqoop import -Dmapreduce.job.queuename=default --connect jdbc:teradata://connection_string/DATABASE=db_name,TMODE=ANSI,LOGMECH=LDAP --username ****** --password ******--query "select * from **** where \$CONDITIONS" --split-by "HASHBUCKET(HASHROW(key to split)) MOD 4" --num-mappers 4 --hive-import --hive-table guestblock.prodrptgstrgtn --boundary-query "select 0, 3 from dbc.dbcinfo" --target-dir directory_name --delete-target-dir --driver com.teradata.jdbc.TeraDriver
Note that I am using 4 mappers, you can use more as well.
这篇关于使用sqoop导入,如何将行添加到现有的配置表格中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!