Sqoop将新添加的列导入到mysql表到现有的配置单元表中 [英] Sqoop import newly added column to mysql table to existing hive table
问题描述
id姓名地址
1 Km sky
2 hd heaven
3 Ab null
4 en null
现在我完成了sqoop导入如下
sqoop导入 - 连接jdbc:mysql:// XXXXXX /测试 - 用户名XXXX - 密码XXXX --queryselect * from testing.test where \ $ CONDITIONS--null-string''--null-non-string''-m 1 \
--hive-import --hive-数据库测试--hive-table test --create-hive-table --target-dir /user/hive/warehouse/testing.db/test
我得到了想要的结果。
然后我们在mysql表中添加了一个新的colum,并增加了2行 id姓名地址nation
1 Km sky null
2 hd heaven null
3 Ab null null
4 en null null
5 abc efd USA
6 fge cde UK
现在我想要使用abov的现有配置表格e列和行已更新。我已完成以下sqoop工作
Sqoop工作:
sqoop job --create sqoop_test - import --connect jdbc:mysql:xxxxxxx / testing --username XXXXX --password XXXX --querySELECT * from testing.test WHERE \ $ CONDITIONS--incremental append \
--check-column id --last-value3 - 分割为'id'--target-dir /user/hive/warehouse/testing.db/test
但是,当我查询配置单元表时,我得到的结果为新行的空值,新列没有显示出来。如下
id名称地址
空NULL NULL
NULL NULL NULL
1 Km sky
2 hd heaven
3 ab
4 en
我们如何添加新列,并将新行添加到配置单元中的现有表?
或者我使用的方法是完全错误的。请让我知道你的假设是错误的,原因是你正在导入不同布局的数据。您创建的第一个表格有3列,第二次导入时,您导入了4列,因此,Hive无法解析这些新记录,并且仅为所有列输出空值。如果您没有足够的理由以文本文件格式导入数据,我建议您在avro和用户模式演变功能中创建表以添加新列。
当您在avro中导入数据时,Sqoop会为您自动生成方案。所以你唯一需要的是创建一个指向导入数据的表并使用生成的模式。在未来导入新字段的情况下,您需要添加具有有效默认值的字段,或者使用默认值(如字符串列)将它们设为空值。
{name:newcolumnname,type:[null,string],default:null},
甚至指定其他有效的默认值
{name:newcolumnname,type:[string],default:val1},//默认值1
{name:newcolumnname,键入:[string],default:},//默认值为空
I have table test in mysql as below:
id name address
1 Km sky
2 hd heaven
3 Ab null
4 en null
Now I done a sqoop import as below
sqoop import--connect jdbc:mysql://XXXXXX/testing --username XXXX --password XXXX --query "select * from testing.test where \$CONDITIONS" --null-string '' --null-non-string '' -m 1\
--hive-import --hive-database testing --hive-table test --create-hive-table --target-dir /user/hive/warehouse/testing.db/test
I got the desired result.
Then we added a new colum to the mysql table with extra 2 rows
id name address nation
1 Km sky null
2 hd heaven null
3 Ab null null
4 en null null
5 abc efd USA
6 fge cde UK
Now I want the existing hive table with the above columns and rows updated. I have done the following sqoop job
Sqoop job:
sqoop job --create sqoop_test -- import --connect jdbc:mysql:xxxxxxx/testing --username XXXXX --password XXXX --query "SELECT * from testing.test WHERE \$CONDITIONS" --incremental append\
--check-column id --last-value "3" --split-by 'id' --target-dir /user/hive/warehouse/testing.db/test
But when I query the hive table I get the result as null for the new rows and the new columns doesn't show up. Like below
id name address
NULL NULL NULL
NULL NULL NULL
1 Km sky
2 hd heaven
3 Ab
4 en
How can we have the new columns appended and new rows added to the existing table in hive?
Or Is the method I am using is completely wrong. Please let me know
your assumption is wrong, the reason for this is because you are importing data with a different layout. The first table that you created, has 3 columns and in the second import, you are importing 4 columns, for that reason, Hive can't parse those new records and simply print null for all the columns. If you don't have a good reason to import the data in textfile format, I suggest you create the table in avro and user the schema evolution feature to add new columns.
When you import data in avro, Sqoop autogenerate the schems for you. so the only thing that you need is create a table pointing to the imported data and use the generated schema. In the case of future imports with new fields, you will need add those fields with a valid default value or make them nullables with default value as follow (for example for string column)
{ "name": "newcolumnname", "type": [ "null", "string" ], "default": "null" },
or even specify other valid default values
{ "name": "newcolumnname", "type": [ "string" ], "default": "val1" }, //default value 1
{ "name": "newcolumnname", "type": [ "string" ], "default": "" }, //default value empty
这篇关于Sqoop将新添加的列导入到mysql表到现有的配置单元表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!