在导入之前是否可以在新文件上使用筛选器编写Sqoop增量导入？ [英] Is it possible to write a Sqoop incremental import with filters on the new file before importing?

查看：119 发布时间：2018/5/31 20:15:02 hadoop merge hdfs sqoop

本文介绍了在导入之前是否可以在新文件上使用筛选器编写Sqoop增量导入？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的疑问是，说，我有一个文件A1.csv在sql-server表上有2000条记录，我将这些数据导入到hdfs中，当天晚些时候，我将3000条记录添加到sql-server表上的同一个文件中。
现在，我想为要添加到hdfs上的第二块数据运行增量导入，但是，我不希望导入完整的3000条记录。我只需要根据我的需要导入一些数据，例如，具有一定条件的1000条记录将作为增量导入的一部分导入。

有没有办法使用sqoop增量导入命令做到这一点？

请帮助，谢谢。

解决方案

您需要一个唯一的密钥或Timestamp字段来标识您的案例中新增1000条记录的增量。使用该字段，您必须选择将数据引入Hadoop。

选项1

是使用sqoop增量追加，下面是它的例子

sqoop import \ --connect jdbc：oracle：thin：@ enkx3-scan：1521：dbm2 \ --username wzhou \ --password wzhou \ --table STUDENT \ - 增量追加\ - 校验列student_id \ -m 4 \ - 拆分 - 主要
参数：

- check-column（col）＃指定要确定要导入哪些行时要检查的列。 --incremental（mode）＃指定Sqoop如何确定哪些行是新的。模式的合法值包括append和lastmodified。 --last-value（value）指定上一次导入中检查列的最大值。

选项2

在sqoop中使用 - query 参数，您可以在其中使用本机sql for mysql /您连接的任何数据库。

示例：

sqoop import \ - （a.id == b.id）WHERE $ CONDITIONS'\ --split-by a.id --target-dir / user / foo / joinresults sqoop import \ --query'SELECT a。*，b。* FROM JOIN b on（a.id == b.id）WHERE $ CONDITIONS'\ -m 1 --target-dir / user / foo / joinresults

My doubt is, Say, I have a file A1.csv with 2000 records on sql-server table, I import this data into hdfs, later that day I have added 3000 records to the same file on sql-server table. Now, I want to run incremental import for the second chunk of data to be added on hdfs, but, I do not want complete 3000 records to be imported. I need only some data according to my necessity to be imported, like, 1000 records with certain condition to be imported as part of the increment import.

Is there a way to do that using sqoop incremental import command?

Please Help, Thank you.
解决方案
You need a unique key or a Timestamp field to identify the deltas which is the new 1000 records in your case. using that field you have to options to bring in the data to Hadoop.

Option 1

is to use the sqoop incremental append, below is the example of it
sqoop import \ --connect jdbc:oracle:thin:@enkx3-scan:1521:dbm2 \ --username wzhou \ --password wzhou \ --table STUDENT \ --incremental append \ --check-column student_id \ -m 4 \ --split-by major
Arguments :
--check-column (col) #Specifies the column to be examined when determining which rows to import. --incremental (mode) #Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified. --last-value (value) Specifies the maximum value of the check column from the previous import.

Option 2

Using the --query argument in sqoop where you can use the native sql for mysql/any database you connect to.

Example :
sqoop import \ --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \ --split-by a.id --target-dir /user/foo/joinresults sqoop import \ --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \ -m 1 --target-dir /user/foo/joinresults

这篇关于在导入之前是否可以在新文件上使用筛选器编写Sqoop增量导入？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在导入之前是否可以在新文件上使用筛选器编写Sqoop增量导入？ [英] Is it possible to write a Sqoop incremental import with filters on the new file before importing?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

在导入之前是否可以在新文件上使用筛选器编写Sqoop增量导入？ [英] Is it possible to write a Sqoop incremental import with filters on the new file before importing?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭