带时间戳列的sqoop增量导入附加模式 [英] sqoop incremental import append mode with timestamp column

查看:214
本文介绍了带时间戳列的sqoop增量导入附加模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以帮助我解决sqoop增量中追加和上次修改模式之间的确切区别吗?

Can someone help me with the exact difference between append and lastmodified modes in sqoop incremental?

当可以通过追加模式(--check-column作为时间戳)完成相同的操作时,lastmodified的需求是什么,它的工作原理相同,也可以导入更新和插入的记录.

What is the need of lastmodified when the same thing can be done with append mode (with --check-column as timestamp).It works the same and imports the updated and inserted records as well.

推荐答案

  1. 模式:append可用于您知道最后一个值的列.
  2. 模式:lastmodified模式可用于时间戳列.可能很难记住上一个修改后的时间戳.如果您知道上一个修改后的时间戳,则可以采用第一种方法.

模式:追加

mysql> describe emp;
+--------+--------------+------+-----+---------+-------+
| Field  | Type         | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+-------+
| id     | int(11)      | YES  |     | NULL    |       |
| name   | varchar(100) | YES  |     | NULL    |       |
| deg    | varchar(100) | YES  |     | NULL    |       |
| salary | int(11)      | YES  |     | NULL    |       |
| dep    | varchar(10)  | YES  |     | NULL    |       |
+--------+--------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

使用sqoop命令导入数据.

Import data using sqoop command.

sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
-m 3 \
--table emp \
--split-by id \
--columns id,name,deg \
--warehouse-dir /user/sqoop/ \
--delete-target-dir \
--as-textfile

HDFS输出

[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/
drwxr-xr-x   - cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp
-rw-r--r--   1 cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
-rw-r--r--   1 cloudera supergroup         70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
-rw-r--r--   1 cloudera supergroup         64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
-rw-r--r--   1 cloudera supergroup         86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002


mysql> select * from emp;
+------+---------+--------------+--------+------+
| id   | name    | deg          | salary | dep  |
+------+---------+--------------+--------+------+
| 1201 | gopal   | manager      |  50000 | tp   |
| 1202 | manisha | Proof reader |  50000 | TP   |
| 1203 | php dev | TECH WRITER  |  50000 | AC   |
| 1204 | Nilesh  | Domino dev   |  70000 | AF   |
| 1205 | Vinayak | Java dev     |  50000 | IT   |
| 1206 | Amish   | Cog dev      |  60000 | IT   |
| 1207 | Jatin   | Oracel dev   |  40001 | IT   |
| 1208 | Viren   | Java dev     |  70004 | IT   |
| 1209 | Ashish  | Oracel dev   |  40001 | IT   |
| 1210 | Satish  | Java dev     |  70004 | IT   |
+------+---------+--------------+--------+------+
10 rows in set (0.00 sec)

将新记录插入表中.

mysql> insert into emp values(1211,'Jag', 'be', 20000, 'IT');
Query OK, 1 row affected (0.03 sec)

mysql> select * from emp;
+------+---------+--------------+--------+------+
| id   | name    | deg          | salary | dep  |
+------+---------+--------------+--------+------+
| 1201 | gopal   | manager      |  50000 | tp   |
| 1202 | manisha | Proof reader |  50000 | TP   |
| 1203 | php dev | TECH WRITER  |  50000 | AC   |
| 1204 | Nilesh  | Domino dev   |  70000 | AF   |
| 1205 | Vinayak | Java dev     |  50000 | IT   |
| 1206 | Amish   | Cog dev      |  60000 | IT   |
| 1207 | Jatin   | Oracel dev   |  40001 | IT   |
| 1208 | Viren   | Java dev     |  70004 | IT   |
| 1209 | Ashish  | Oracel dev   |  40001 | IT   |
| 1210 | Satish  | Java dev     |  70004 | IT   |
| 1211 | Jag     | be           |  20000 | IT   |
+------+---------+--------------+--------+------+
11 rows in set (0.00 sec)

增量导入命令

sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table emp \
--split-by id \
--check-column id \
--incremental append \
--last-value 1210 \
--warehouse-dir /user/sqoop/ \
--as-textfile

导入后

[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/emp
-rw-r--r--   1 cloudera supergroup          0 2017-12-02 13:14 /user/sqoop/emp/_SUCCESS
-rw-r--r--   1 cloudera supergroup         70 2017-12-02 13:14 /user/sqoop/emp/part-m-00000
-rw-r--r--   1 cloudera supergroup         64 2017-12-02 13:14 /user/sqoop/emp/part-m-00001
-rw-r--r--   1 cloudera supergroup         86 2017-12-02 13:14 /user/sqoop/emp/part-m-00002
-rw-r--r--   1 cloudera cloudera           21 2017-12-02 13:48 /user/sqoop/emp/part-m-00003
[cloudera@quickstart lib]$ hadoop fs -cat /user/sqoop/emp/part-m-00003
1211,Jag,be,20000,IT

模式:上次修改

mysql> describe orders;

+-------------------+-------------+------+-----+---------+----------------+

| Field             | Type        | Null | Key | Default | Extra          |

+-------------------+-------------+------+-----+---------+----------------+

| order_id          | int(11)     | NO   | PRI | NULL    | auto_increment |

| order_date        | datetime    | NO   |     | NULL    |                |

| order_customer_id | int(11)     | NO   |     | NULL    |                |

| order_status      | varchar(45) | NO   |     | NULL    |                |

+-------------------+-------------+------+-----+---------+----------------+

4 rows in set (0.00 sec)    

将订单导入hdfs

sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--split-by order_id  \
--target-dir /user/sqoop/orders \
--as-textfile

导入后

[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders
-rw-r--r--   1 cloudera supergroup          0 2017-12-02 16:01 /user/sqoop/orders/_SUCCESS
-rw-r--r--   1 cloudera supergroup     741597 2017-12-02 16:01 /user/sqoop/orders/part-m-00000
-rw-r--r--   1 cloudera supergroup     753022 2017-12-02 16:01 /user/sqoop/orders/part-m-00001
-rw-r--r--   1 cloudera supergroup     752368 2017-12-02 16:01 /user/sqoop/orders/part-m-00002
-rw-r--r--   1 cloudera supergroup     752940 2017-12-02 16:01 /user/sqoop/orders/part-m-00003

更新订单数据

mysql> select * from orders where order_id=10;
+----------+---------------------+-------------------+-----------------+
| order_id | order_date          | order_customer_id | order_status    |
+----------+---------------------+-------------------+-----------------+
|       10 | 2013-07-25 00:00:00 |              5648 | PENDING_PAYMENT |
+----------+---------------------+-------------------+-----------------+
1 row in set (0.00 sec)

mysql> update orders set order_status='CLOSED', order_date=now() where order_id=10;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from orders where order_id=10;
+----------+---------------------+-------------------+--------------+
| order_id | order_date          | order_customer_id | order_status |
+----------+---------------------+-------------------+--------------+
|       10 | 2017-12-02 16:19:23 |              5648 | CLOSED       |
+----------+---------------------+-------------------+--------------+
1 row in set (0.00 sec)

导入其他数据

sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--split-by order_id  \
--check-column order_date \
--merge-key order_id \
--incremental lastmodified \
--target-dir /user/sqoop/orders1 \
--as-textfile

输出

[cloudera@quickstart lib]$ hadoop fs -ls -R /user/sqoop/orders1
-rw-r--r--   1 cloudera cloudera          0 2017-12-02 16:07 /user/sqoop/orders1/_SUCCESS
-rw-r--r--   1 cloudera cloudera    2999918 2017-12-02 16:07 /user/sqoop/orders1/part-r-00000

注意:如果我们使用的目录与以前的目录相同,则它将删除旧文件并创建新零件.

Note: If we are using the same directory(orders) as the earlier it is deleting the old files and creating a new part.

这篇关于带时间戳列的sqoop增量导入附加模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆