如何在地图缩小中选择最优键? [英] How to select the optimal key in map reduce?

查看:105
本文介绍了如何在地图缩小中选择最优键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理股票交易日志文件。每一行代表一个包含20个制表符分隔值的交易交易。我正在使用hadoop来处理这个文件并做一些交易的基准测试。现在,对于每一行我必须执行单独的基准计算,因此在map-reduce中不需要reduce函数。为了执行每行的基准计算,我必须查询Sybase数据库以获取与该行对应的一些标准值。数据库按每行的两个值[贸易标识和库存标识]编制索引。现在我的问题是我应该在我的mapreduce程序中使用tradeId和StockId作为关键字,还是应该为我的关键字选择其他值/ [值的组合]。

解决方案因此,对于每一行输入,你将要查询一个数据库,然后分别为每一行执行基准计算。在完成基准计算后,您将输出每条线的基准值。

在这种情况下,您可以根本不使用reducer,也可以使用身份缩减器。

因此,您的map函数将在一行中读取,然后它将针对标准值向Sybase数据库发出查询,然后执行基准计算。既然你想输出每一行的基准值,你可以让Map函数输出该行作为关键值和基准值作为值,即<行,基准值>



你的map函数看起来像这样:(我假设你的基准值是一个整数)

  public void map(Text key,IntWritable value,Context context)throws Exception {
String line = value.toString(); //这将是您在最终输出中的密钥

/ *
在行上执行操作

* /

/ *

标准值=<来自sybase查询的返回值>;
$ b $ /

/ *执行基准计算并获得基准值* /

context.write(line,benchmarkValue);




}


I am working with stocks transaction log files. Each line denotes a trade transaction with 20 tab separated values. I am using hadoop to process this file and do some benchmarking of trades. Right now for each line I have to perform separate benchmark calculations and hence there is no need for reduce function in map-reduce. In order to perform the benchmark calculation of each line I have to query a Sybase database to obtains some standard values corresponding to that line. The database is indexed on two values of each line [ trade Id and Stock Id]. Now my question is should I use tradeId and StockId as key in my mapreduce program or should I choose other value/[combination of values] for my key.

解决方案

So, for each line of input, you're going to query a database and then perform benchmark calculations for each line separately. After you finish the benchmark calculations, you are going to output each line with the benchmark value.

In this case, you can either not use a reducer at all, or use an identity reducer.

So your map function will read in a line, then it will fire a query to the Sybase database for the standard values, and then perform benchmark calculations. Since you want to output each line with the benchmark value, you could have the Map function output the line as key and benchmark value as value, i.e <line, benchmark value>

Your map function would look something like this: (I'm assuming your benchmark value is an integer)

public void map(Text key, IntWritable value, Context context) throws Exception {
    String line = value.toString();   //this will be your key in the final output

     /* 
         Perform operations on the line

      */

      /* 

         standard values = <return value from sybase query.>;

      */

      /*Perform benchmark calculations and obtain benchmark values */

      context.write(line,benchmarkValue);     




}

这篇关于如何在地图缩小中选择最优键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆