Hadoop和MySQL集成 [英] Hadoop and MySQL Integration
问题描述
该过程如下所示:
Hadoop将从MySQL数据库收集数据,然后处理它。
输出将被导出回MySQL数据库。
这是一个很好的实现吗?这会提高我们系统的整体性能吗?
有什么要求,并且之前已经完成了?一个好的教程真的有帮助。
谢谢
不是常规的hadoop用法。在以下情况下,它是有意义的:
a)如果您有很好的方法将数据分区到输入(如现有的分区)。
b)每个分区的处理相对较重。我会给每个分区至少10秒的CPU时间。
如果两个条件都满足 - 您将能够应用任何所需的CPU功率量来进行数据处理。
如果您正在进行简单的扫描或聚合 - 我认为您不会获得任何收益。另一方面 - 如果你要在每个分区上运行一些CPU密集型算法 - 那么你的收益的确会很大。
我还会提到一个单独的案例 - 如果您的处理需要大量数据排序。我认为MySQL不会对数十亿条记录进行排序。 Hadoop会这样做。
We would like to implement Hadoop on our system to improve its performance.
The process works like this: Hadoop will gather data from MySQL database then process it. The output will then be exported back to MySQL database.
Is this a good implementation? Will this improve our system's overall performance? What are the requirements and has this been done before? A good tutorial would really help.
Thanks
Altough it is not a regular hadoop usage. It migh make sense in following scenario:
a) If you have good way to partition your data into the inputs (like existing partitioning).
b) The processing of each partition is relatively heavy. I would give the number of at least 10 seconds of CPU time per partition.
If both conditions are met - you will be able to apply any desired amount of CPU power to make your data processing.
If your are doing simple scan or aggregation - I think your will not gain anything. On other hand - if your are going to run some CPU intensive algorithms on each partition - then indeed your gain can be significant.
I would also mention a separate case- if your processing require massive data sorting. I do not think that MySQL will be good in sorting billions of records. Hadoop will do it.
这篇关于Hadoop和MySQL集成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!