Sqoop将新添加的列导入到mysql表到现有的配置单元表中 [英] Sqoop import newly added column to mysql table to existing hive table

查看:224
本文介绍了Sqoop将新添加的列导入到mysql表到现有的配置单元表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  id姓名地址
1 Km sky
2 hd heaven
3 Ab null
4 en null

现在我完成了sqoop导入如下

  sqoop导入 - 连接jdbc:mysql:// XXXXXX /测试 - 用户名XXXX  - 密码XXXX --queryselect * from testing.test where \ $ CONDITIONS--null-string''--null-non-string''-m 1 \ 
--hive-import --hive-数据库测试--hive-table test --create-hive-table --target-dir /user/hive/warehouse/testing.db/test

我得到了想要的结果。

然后我们在mysql表中添加了一个新的colum,并增加了2行

  id姓名地址nation 

1 Km sky null
2 hd heaven null
3 Ab null null
4 en null null
5 abc efd USA
6 fge cde UK

现在我想要使用abov的现有配置表格e列和行已更新。我已完成以下sqoop工作



Sqoop工作:

  sqoop job --create sqoop_test  -  import --connect jdbc:mysql:xxxxxxx / testing --username XXXXX --password XXXX --querySELECT * from testing.test WHERE \ $ CONDITIONS--incremental append \ 
--check-column id --last-value3 - 分割为'id'--target-dir /user/hive/warehouse/testing.db/test

但是,当我查询配置单元表时,我得到的结果为新行的空值,新列没有显示出来。如下

  id名称地址

空NULL NULL
NULL NULL NULL
1 Km sky
2 hd heaven
3 ab
4 en

我们如何添加新列,并将新行添加到配置单元中的现有表?



或者我使用的方法是完全错误的。请让我知道你的假设是错误的,原因是你正在导入不同布局的数据。您创建的第一个表格有3列,第二次导入时,您导入了4列,因此,Hive无法解析这些新记录,并且仅为所有列输出空值。如果您没有足够的理由以文本文件格式导入数据,我建议您在avro和用户模式演变功能中创建表以添加新列。

当您在avro中导入数据时,Sqoop会为您自动生成方案。所以你唯一需要的是创建一个指向导入数据的表并使用生成的模式。在未来导入新字段的情况下,您需要添加具有有效默认值的字段,或者使用默认值(如字符串列)将它们设为空值。

  {name:newcolumnname,type:[null,string],default:null},

甚至指定其他有效的默认值

  {name:newcolumnname,type:[string],default:val1},//默认值1 
{name:newcolumnname,键入:[string],default:},//默认值为空


I have table test in mysql as below:

id  name  address
1  Km  sky
2  hd  heaven
3  Ab  null
4  en  null

Now I done a sqoop import as below

sqoop import--connect jdbc:mysql://XXXXXX/testing --username XXXX --password XXXX --query "select * from  testing.test where \$CONDITIONS" --null-string '' --null-non-string '' -m 1\ 
--hive-import --hive-database testing --hive-table test --create-hive-table --target-dir  /user/hive/warehouse/testing.db/test

I got the desired result.

Then we added a new colum to the mysql table with extra 2 rows

id  name  address  nation

1  Km  sky  null
2  hd  heaven  null
3  Ab  null  null
4  en  null  null
5  abc efd  USA
6  fge cde  UK

Now I want the existing hive table with the above columns and rows updated. I have done the following sqoop job

Sqoop job:

sqoop job --create sqoop_test -- import --connect jdbc:mysql:xxxxxxx/testing --username XXXXX --password XXXX --query "SELECT * from testing.test WHERE \$CONDITIONS" --incremental append\ 
--check-column id --last-value "3" --split-by 'id' --target-dir  /user/hive/warehouse/testing.db/test 

But when I query the hive table I get the result as null for the new rows and the new columns doesn't show up. Like below

id  name  address

NULL  NULL  NULL
NULL  NULL  NULL
1  Km  sky
2  hd  heaven
3  Ab  
4  en  

How can we have the new columns appended and new rows added to the existing table in hive?

Or Is the method I am using is completely wrong. Please let me know

解决方案

your assumption is wrong, the reason for this is because you are importing data with a different layout. The first table that you created, has 3 columns and in the second import, you are importing 4 columns, for that reason, Hive can't parse those new records and simply print null for all the columns. If you don't have a good reason to import the data in textfile format, I suggest you create the table in avro and user the schema evolution feature to add new columns.

When you import data in avro, Sqoop autogenerate the schems for you. so the only thing that you need is create a table pointing to the imported data and use the generated schema. In the case of future imports with new fields, you will need add those fields with a valid default value or make them nullables with default value as follow (for example for string column)

{ "name": "newcolumnname", "type": [ "null", "string" ], "default": "null" },

or even specify other valid default values

{ "name": "newcolumnname", "type": [ "string" ], "default": "val1" }, //default value 1
{ "name": "newcolumnname", "type": [ "string" ], "default": "" }, //default value empty

这篇关于Sqoop将新添加的列导入到mysql表到现有的配置单元表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆