在运行Sqoop导入和导出时,如何找到最佳的映射器数量? [英] How to find optimal number of mappers when running Sqoop import and export?

查看:159
本文介绍了在运行Sqoop导入和导出时,如何找到最佳的映射器数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Sqoop 1.4.2版和Oracle数据库.

I'm using Sqoop version 1.4.2 and Oracle database.

运行Sqoop命令时.例如这样的

When running Sqoop command. For example like this:

./sqoop import                               \
    --fs <name node>                         \
    --jt <job tracker>                       \
    --connect <JDBC string>                  \
    --username <user> --password <password>  \
    --table <table> --split-by <cool column> \
    --target-dir <where>                     \
    --verbose --m 2

我们可以指定-m -我们希望Sqoop运行多少个并行任务(它们可能同时访问数据库). 相同的选项可用于./sqoop导出< ...>

We can specify --m - how many parallel tasks do we want Sqoop to run (also they might be accessing Database at same time). Same option is available for ./sqoop export <...>

是否存在一些启发式方法(可能基于数据大小),可以帮助您猜测要使用的最佳任务数量是什么?

Is there some heuristic (probably based on size of data) which will help to guess what is optimal number of task to use?

谢谢!

推荐答案

这取自O'Reilly Media的Apache Sqoop Cookbook,似乎是最合乎逻辑的答案.

This is taken from Apache Sqoop Cookbook by O'Reilly Media, and seems to be the most logical answer.

映射器的最佳数量取决于许多变量:您需要考虑数据库类型,数据库服务器所使用的硬件以及对数据库需要处理的其他请求的影响.没有最佳数量的映射器 适用于所有方案.相反,建议您尝试为您的环境和用例找到最佳的并行度.一个好主意是先从少量的映射器开始,然后逐渐增加,而不是从大量的映射器开始,然后逐步降低.

The optimal number of mappers depends on many variables: you need to take into account your database type, the hardware that is used for your database server, and the impact to other requests that your database needs to serve. There is no optimal number of mappers that works for all scenarios. Instead, you’re encouraged to experiment to find the optimal degree of parallelism for your environment and use case. It’s a good idea to start with a small number of mappers, slowly ramping up, rather than to start with a large number of mappers, working your way down.

这篇关于在运行Sqoop导入和导出时,如何找到最佳的映射器数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆