通过 key Scalding Hadoop 写入多个输出,一个 MapReduce Job [英] Write to multiple outputs by key Scalding Hadoop, one MapReduce Job

查看:15
本文介绍了通过 key Scalding Hadoop 写入多个输出,一个 MapReduce Job的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在单个 Map Reduce 作业中使用 Scalding(/cascading) 根据键写入多个输出.我当然可以对所有可能的键使用 .filter,但这是一个可怕的 hack,它会激发很多工作.

How can you write to multiple outputs dependent on the key using Scalding(/cascading) in a single Map Reduce Job. I could of course use .filter for all the possible keys, but that is a horrible hack, which will fire up many jobs.

推荐答案

TemplatedTsv(从 0.9.0rc16 及以上版本),与 Cascading TemplateTsv 完全相同.

There is TemplatedTsv in Scalding (from version 0.9.0rc16 and up), exactly same as Cascading TemplateTsv.

Tsv(args("input"), ('COUNTRY, 'GDP))
.read
.write(TemplatedTsv(args("output"), "%s", 'COUNTRY))
// it will create a directory for each country under "output" path in Hadoop mode.

这篇关于通过 key Scalding Hadoop 写入多个输出,一个 MapReduce Job的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆