Flume将数据从MySQL迁移到Hadoop [英] Flume to migrate data from MySQL to Hadoop

查看:171
本文介绍了Flume将数据从MySQL迁移到Hadoop的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请分享您的想法。
$ b 需要将MySQL数据库中的数据迁移到Hadoop / HBase以进行分析。



数据应该实时或接近实时迁移。水槽可以支持这一点。



有什么可以是更好的方法。 解决方案

直接回答对你的问题是肯定的。 Flume被设计为用于事件/日志结构化数据的分布式数据传输和聚合系统。如果设置正确,flume可以推送数据以便在Hadoop中持续摄取。这是Flume设置正确以从各种来源(本例中为MySql)收集数据的原因,并且我确信如果数据在源处可用,Flume中的接收器将以毫秒级别将其同步到HDFS。一旦数据在HDFS / HBASE上可用,您可以在其上运行查询,并且可以根据基础设施进行处理。



所以我想说Flume配置对于将数据近实时地推送到HDFS非常重要,然后其余部分取决于您的Map / Reduce群集性能以及查询的方式是针对正在处理的数据编写的。



我还发现以下资源可供您了解使用Flume和HDFS:
http://assets.en.oreilly.com/1/event/61 /Real-time%20Streaming%20Analysis%20for%20Hadoop%20and%20Flume%20Presentation.pdf


Please share your thoughts.

The requirement is to migrate the data in MySQL db to Hadoop/HBase for analytic purposes.

The data should be migrated real time or near real time. Can flume support this.

What can be a better approach.

解决方案

The direct answer to your question is yes. Flume is designed as a distributed data transport and aggregation system for event/log structured data. If set up "correctly" flume can push data for continuous ingestion in Hadoop. This is when Flume is set up correctly to collect data from various sources (in this case MySql) and I am sure if data is available at source, the sink in Flume will sync it to HDFS at millisecond level. Once data is available at HDFS/HBASE you can run queries on it and can be processed depend on infrastructure.

So I would say the Flume configuration is very important to push data in near real time to HDFS and then the rest depends on your Map/Reduce cluster performance and how the queries are written with regard to the data being processed.

I also found the following resource for you to understand using Flume and HDFS: http://assets.en.oreilly.com/1/event/61/Real-time%20Streaming%20Analysis%20for%20Hadoop%20and%20Flume%20Presentation.pdf

这篇关于Flume将数据从MySQL迁移到Hadoop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆