Hive 中的增量/增量负载 [英] Delta/Incremental Load in Hive

查看：33 发布时间：2021/12/28 23:38:12 hadoop hive sqoop hiveql

本文介绍了Hive 中的增量/增量负载的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下用例:

我的应用程序在 RDBMS 数据库中有一个包含 多年数据 的表.我们使用 sqoop 将数据导入 HDFS 并加载到按年、月分区的 hive 表中.

My application has a table having multiyear data in RDBMS DB. We have used sqoop to get data into HDFS and have loaded into hive table partitioned by year, month.

现在，应用程序也会每天更新并将新记录插入到 RDBMS 表中.这些更新的记录可以跨越历史月份.更新的记录和新的插入记录可以通过更新的时间戳字段来确定(它将具有当前日期时间戳).

Now, the application updates, and inserts new records into RDBMS Table table daily as well. These updated records can span across history months. Updated records and new insert records can be determined by updated timestamp field (it will have current day timestamp).

现在的问题是:如何使用这些更新的记录每天执行增量/增量加载配置单元表.

Now the problem here is : how to do delta/incremental load hive table daily using these updated records.

-> 我知道有一个允许增量导入的 sqoop 功能.但是，对于我们来说，只有新的增量导入是不够的.

-> I know there is a sqoop functionality which allows incremental imports. But, only new incremental import is not enough for us.

因为——

->我不能直接在hive表中插入这些记录(使用insert into)，因为这会导致重复记录(更新记录).

-> I can not directly insert these records (using insert into) in hive table because it will result in duplicate records (updated records).

-> 同样的方式我不能使用插入覆盖语句，因为这些只是跨越多个月的更新和插入记录.插入覆盖将删除较早的记录.

-> Same way I can not use insert overwrite statement as these are just update and insert records spanning across multiple month. Insert overwrite will delete earlier records.

当然，最简单的选择是每天使用 sqoop 获取完整数据，但我们不想这样做，因为数据量很大.

Of course, easiest option is to get full data using sqoop daily but we don't want to do it as data volume is large.

所以，基本上我们只想完全加载我们收到更新/插入记录的那些分区.

So , basically we want to fully load only those partitions for which we have received update/insert records.

我们愿意探索 hive 或 sqoop 端的选项.你能告诉我们吗?

We are open to explore option at hive or sqoop end. Can you please let us know?

提前致谢.

Hive 中的增量/增量负载 [英] Delta/Incremental Load in Hive

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Hive 中的增量/增量负载 [英] Delta/Incremental Load in Hive

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭