Map Reduce：如何对两个数据集记录进行分区，以及如何使这些块进行分区 [英] Map Reduce: How can partitioning two data-sets records and how can get these blocks to make them pairs

查看：55 发布时间：2019/6/13 22:09:04 Java

本文介绍了Map Reduce：如何对两个数据集记录进行分区，以及如何使这些块进行分区的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想通过以下操作创建Map功能：

步骤1：

我有两个数据集R和S.我想将两个数据集分成n个相等大小的块，这可以通过将每个（R / n和S / n）记录放入一个块来完成。

之后：

步骤2：然后每个可能的一对块（一个来自R，一个来自S）被划分为在Map阶段结束时的一个桶，因此可以从Reduce Function中取出一些id作为每个值对的键。例如

I want to create a Map function with the following operations:

Step 1:

I have two data sets R and S. I want to partition the two data sets into n equal-sized blocks which can be done by putting every (R/n and S/n )records into one block.

After that:

Step 2: Then every possible pair of blocks (one from R and one from S) is partitioned into a bucket at the end of Map phase so that can be taken from the Reduce Function as input with some id as key for each value pairs. e.g will be

<id:(Sij,Ril)>

所以我的问题是：

1）我可以用于第1步的任何已实现的功能吗？如何为每个数据集分别实现此操作。

2）如何在步骤2中专门参考这些数据集，以便从R中取出一个块一个来自S？

注意：在main中我定义了两个这样的数据集：

So my questions are:

1)Is there any implemented function that I can use for step 1? How implement this operation separately for each data-set.

2)How can I refer specifically to these data sets in step 2 so that I can take one block from R and one from S?

Note: In main I define the two data sets like this :

FileInputFormat.setInputPaths(conf, new Path(args[0]), new Path(args[1]));

Map Reduce：如何对两个数据集记录进行分区，以及如何使这些块进行分区 [英] Map Reduce: How can partitioning two data-sets records and how can get these blocks to make them pairs

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

Map Reduce：如何对两个数据集记录进行分区，以及如何使这些块进行分区 [英] Map Reduce: How can partitioning two data-sets records and how can get these blocks to make them pairs

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭