将数据配置到Hadoop MR / Pig中的作业 [英] Piping data into jobs in Hadoop MR/Pig
问题描述
我有三种不同类型的作业在HDFS上的数据上运行。 欢迎您提出任何建议。 PS:Oozie不适合工作流程。由于可伸缩性问题,也排除了级联框架。 Hadoop在M / R步骤之后固有写入存储(例如HDFS)。如果你想要记忆中的东西,也许你需要看看 Spark 。 I have three different type of jobs running on the data in HDFS.
These three jobs have to be run separately in the current scenario.
Now, Any suggestions are welcome for this scenario. PS : Oozie is not fitting for the workflow.Cascading framework is also ruled out because of Scalability issues.
Thanks Hadoop inherently writes to storage (e.g. HDFS) after M/R steps. If you want something in memory, maybe you need to look into something like Spark. 这篇关于将数据配置到Hadoop MR / Pig中的作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
这三项工作必须在当前情况下单独运行。
现在,我们希望通过将一个作业的OUTPUT数据传输到另一个作业而无需在HDFS中写入数据来改进架构和整体性能,从而一起运行三个作业。$ c
谢谢
we want to run the three jobs together by piping the OUTPUT data of one job to the other job without writing the data in HDFS to improve the architecture and overall performance.