Hadoop和MySQL集成 [英] Hadoop and MySQL Integration

查看:281
本文介绍了Hadoop和MySQL集成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们希望在我们的系统上实现Hadoop以提高其性能。

该过程如下所示:
Hadoop将从MySQL数据库收集数据,然后处理它。
输出将被导出回MySQL数据库。



这是一个很好的实现吗?这会提高我们系统的整体性能吗?
有什么要求,并且之前已经完成了?一个好的教程真的有帮助。



谢谢

不是常规的hadoop用法。在以下情况下,它是有意义的:


a)如果您有很好的方法将数据分区到输入(如现有的分区)。


b)每个分区的处理相对较重。我会给每个分区至少10秒的CPU时间。


如果两个条件都满足 - 您将能够应用任何所需的CPU功率量来进行数据处理。


如果您正在进行简单的扫描或聚合 - 我认为您不会获得任何收益。另一方面 - 如果你要在每个分区上运行一些CPU密集型算法 - 那么你的收益的确会很大。


我还会提到一个单独的案例 - 如果您的处理需要大量数据排序。我认为MySQL不会对数十亿条记录进行排序。 Hadoop会这样做。


We would like to implement Hadoop on our system to improve its performance.

The process works like this: Hadoop will gather data from MySQL database then process it. The output will then be exported back to MySQL database.

Is this a good implementation? Will this improve our system's overall performance? What are the requirements and has this been done before? A good tutorial would really help.

Thanks

解决方案

Altough it is not a regular hadoop usage. It migh make sense in following scenario:
a) If you have good way to partition your data into the inputs (like existing partitioning).
b) The processing of each partition is relatively heavy. I would give the number of at least 10 seconds of CPU time per partition.
If both conditions are met - you will be able to apply any desired amount of CPU power to make your data processing.
If your are doing simple scan or aggregation - I think your will not gain anything. On other hand - if your are going to run some CPU intensive algorithms on each partition - then indeed your gain can be significant.
I would also mention a separate case- if your processing require massive data sorting. I do not think that MySQL will be good in sorting billions of records. Hadoop will do it.

这篇关于Hadoop和MySQL集成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆