如何“分类"?从30个最常用的字眼衍生而来? [英] How to "sort" descending from the 30 most frequent words?

查看：41 发布时间：2021/5/13 20:18:51 hadoop

本文介绍了如何“分类"?从30个最常用的字眼衍生而来?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的映射器(Hadoop 1.2.1)创建了令牌的键-值对，这些令牌是从一个简单的文本文件中读取的.没有火箭科学.最后，reducer将相同的键捆绑"(在Hadoop中，您是否像SQL中那样称呼该分组?)相同的键，并且还将值1求和.这是默认的Hadoop教程.

My mapper (Hadoop 1.2.1) creates key-values pairs of tokens, which I read from a simple text file. No rocket science. The reducer finally "bundles" (in Hadoop, do you call that grouping like in SQL?) the same keys and also sums the values of 1. This is the default Hadoop tutorial.

但是，当这些值可用于我的reducer时，我想将它们全部降序排列.仅显示前30个标记(字符串，单词).

However, when these values are available to my reducer, I want to sort all of them descending. Only displaying the top 30 tokens (strings, words).

似乎有些概念对我来说还不清楚.

It seems like some concepts are not clear to me.

首先，对每个键值对调用 reduce 方法，对吗?因此，我看不到要缓冲诸如HashMap之类的东西的地方，该地方可以保存最高的结果(最常使用的标记).
我当时想，如果我有这样一个变量，我可以轻松地比较并插入前30名中具有值的每个键.什么是处理此频率排名任务的合适方法?

First, the reduce method is invoked for every key-value pair, right? Thus, I don't see a place to buffer something like a HashMap, which could hold the top results (most frequent tokens).
I was thinking that if I had such a variable, I could easily compar and insert every key that has a value within the top 30. What is the appropriate approach to handle this frequency-ranking task?

public static class Reduce extends MapReduceBase implements
            Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,
        OutputCollector<Text, IntWritable> output, Reporter reporter)
        throws IOException {

    int sum = 0;
    while (values.hasNext()) {
        sum += values.next().get();
    }
            // CURRENTLY I SIMPLY OUTPUT THE KEY AND THE SUM.
            // IN THIS PLACE, HOW COULD YOU STORE E.G. A HASHMAP THAT
            // COULD STORE THE TOP 30?
    output.collect(key, new IntWritable(sum));
    LOG.info("REDUCE: added to output:: key: " + key.toString());
}

}

如何“分类"?从30个最常用的字眼衍生而来? [英] How to "sort" descending from the 30 most frequent words?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何“分类"?从30个最常用的字眼衍生而来? [英] How to &quot;sort&quot; descending from the 30 most frequent words?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何“分类"?从30个最常用的字眼衍生而来? [英] How to "sort" descending from the 30 most frequent words?

登录关闭