为什么组合器输入记录的数量多于地图输出的数量? [英] Why is the number of combiner input records more than the number of outputs of maps?

查看:153
本文介绍了为什么组合器输入记录的数量多于地图输出的数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

组合器在Mapper之后和Reducer之前运行,它将接收由Mapper实例在给定节点上发出的所有数据。然后它将输出发送到减速器。所以组合器输入的记录应该小于地图输出。

  12/08/29 13:38:49信息mapred .JobClient:Map-Reduce Framework 

12/08/29 13:38:49信息mapred.JobClient:减少输入组= 8649

12/08/29 13: 38:49信息mapred.JobClient:地图输出物化字节= 306210

12/08/29 13:38:49信息mapred.JobClient:合并输出记录= 859412

12/08/29 13:38:49信息mapred.JobClient:地图输入记录= 457272

12/08/29 13:38:49信息mapred.JobClient:减少随机字节数= 0

12/08/29 13:38:49信息mapred.JobClient:减少输出记录= 8649

12/08/29 13:38:49信息mapred.JobClient:Spilled Records = 1632334

12/08/29 13:38:49信息mapred.JobClient:地图输出字节= 331837344

12/08/29 13:38:49信息mapred.JobClient:**合并输入记录= 26154506 **

12/08/29 13:38:49信息mapred.JobClient:**地图输出记录= 2531 2392 **

12/08/29 13:38:49信息mapred.JobClient:SPLIT_RAW_BYTES = 218

12/08/29 13:38:49信息mapred。 JobClient:减少输入记录= 17298


解决方案

我认为这是因为Combiner也可以运行在之前Combine步骤的输出上,因为Combiner会运行并生成新记录,然后与来自Mappers的其他记录结合使用。也可能是Map输出记录是在Combiner运行后计算的,这意味着由于某些组合已被组合,所以记录数较少。


A Combiner runs after the Mapper and before the Reducer, it will receive as input all data emitted by the Mapper instances on a given node. It then emits output to the Reducers. So the records of the combiner input should less than the maps ouputs.

12/08/29 13:38:49 INFO mapred.JobClient:   Map-Reduce Framework

12/08/29 13:38:49 INFO mapred.JobClient:     Reduce input groups=8649

12/08/29 13:38:49 INFO mapred.JobClient:     Map output materialized bytes=306210

12/08/29 13:38:49 INFO mapred.JobClient:     Combine output records=859412

12/08/29 13:38:49 INFO mapred.JobClient:     Map input records=457272

12/08/29 13:38:49 INFO mapred.JobClient:     Reduce shuffle bytes=0

12/08/29 13:38:49 INFO mapred.JobClient:     Reduce output records=8649

12/08/29 13:38:49 INFO mapred.JobClient:     Spilled Records=1632334

12/08/29 13:38:49 INFO mapred.JobClient:     Map output bytes=331837344

12/08/29 13:38:49 INFO mapred.JobClient:     **Combine input records=26154506**

12/08/29 13:38:49 INFO mapred.JobClient:     **Map output records=25312392**

12/08/29 13:38:49 INFO mapred.JobClient:     SPLIT_RAW_BYTES=218

12/08/29 13:38:49 INFO mapred.JobClient:     Reduce input records=17298

解决方案

I think it's because the Combiner can also run on the output of previous Combine steps, since your Combiner runs and produces new records which are then Combined with other records coming out of your Mappers. It may also be that Map output records is calculated after the Combiner runs, meaning that there are less records because some have been Combined.

这篇关于为什么组合器输入记录的数量多于地图输出的数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆