sqoop中的以下命令是什么? [英] what are the following commands in sqoop?

查看:35
本文介绍了sqoop中的以下命令是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能告诉我 --split-by 和边界查询在 sqoop 中有什么用?

Can anyone tell me what is the use of --split-by and boundary query in sqoop?

sqoop import --connect jdbc:mysql://localhost/my --username user --password 1234 --query 'select * from table where id=5 AND $CONDITIONS' --split-by table.id --target-dir/dir

sqoop import --connect jdbc:mysql://localhost/my --username user --password 1234 --query 'select * from table where id=5 AND $CONDITIONS' --split-by table.id --target-dir /dir

推荐答案

--split-by : 用于指定用于为导入生成拆分的表的列.这意味着它指定在将数据导入集群时将使用哪一列来创建拆分.它可用于通过实现更大的并行度来增强导入性能.Sqoop 根据表的特定列中的值创建拆分,该列由用户通过导入命令由 --split-by 指定.如果不可用,则使用输入表的主键创建拆分.

--split-by : It is used to specify the column of the table used to generate splits for imports. This means that it specifies which column will be used to create the split while importing the data into your cluster. It can be used to enhance the import performance by achieving greater parallelism. Sqoop creates splits based on values in a particular column of the table which is specified by --split-by by the user through the import command. If it is not available, the primary key of the input table is used to create the splits.

使用原因:有时主键在最小值和最大值之间没有均匀分布的值(如果 --split-by 不可用,则用于创建拆分).在这种情况下,您可以指定其他一些具有适当数据分布的列来创建拆分以实现高效导入.

Reason to use : Sometimes the primary key doesn't have an even distribution of values between the min and max values(which is used to create the splits if --split-by is not available). In such a situation you can specify some other column which has proper distribution of data to create splits for efficient imports.

--boundary-query : 默认情况下,sqoop 将使用查询 select min(), max() from 来找出创建拆分的边界.在某些情况下,此查询不是最佳查询,因此您可以使用 --boundary-query 参数指定任何返回两个数字列的任意查询.

--boundary-query : By default sqoop will use query select min(), max() from to find out boundaries for creating splits. In some cases this query is not the most optimal so you can specify any arbitrary query returning two numeric columns using --boundary-query argument.

使用原因:如果 --split-by 没有为您提供最佳性能,您可以使用它来进一步提高性能.

Reason to use : If --split-by is not giving you the optimal performance you can use this to improve the performance further.

这篇关于sqoop中的以下命令是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆