如何为hadoop streaming指定分区器 [英] How to specify the partitioner for hadoop streaming

查看：302 发布时间：2018/5/31 20:15:39 hadoop mapreduce hadoop-streaming hadoop-partitioning

本文介绍了如何为hadoop streaming指定分区器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个像下面这样的自定义分区：

  import java.util。*; 
 import org.apache.hadoop.mapreduce。*; 
 
 public static class SignaturePartitioner extends Partitioner< Text，Text> 
 {
 @Override 
 public int getPartition（Text key，Text value，int numReduceTasks）
 {
 return（key.toString（）。Split（''） [0] .hashCode（）& Integer.MAX_VALUE）％numReduceTasks;

我设置了如下所示的hadoop streaming参数

  -file SignaturePartitioner.java \ 
 -partitioner SignaturePartitioner \

然后我得到一个错误：Class Not Found。

您知道问题出在哪里吗？

最好的问候，

根本原因是流媒体 - 2.6.0.jar使用mapred api而不是mapreduce api。另外，实现Partitioner接口，而不是扩展Partitioner类。以下对我有用：
import java.io.IOException; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.Partitioner; import org.apache.hadoop.mapred.JobConf;` public class Mypartitioner实现分区程序<文本，文本> {` public void configure（JobConf job）{} public int getPartition（Text pkey，Text pvalue，int pnumparts） { if（pkey.toString （）.startsWith（a）） return 0; else返回1; } }
编译Mypartitioner，创建jar，然后

bin / hadoop jar share / hadoop / tools / lib / hadoop-streaming-2.6.0.jar -libjars / home / sanjiv / hadoop-2.6.0 / Mypartitioner.jar -D mapreduce.job.reduces = 2 -files /home/sanjiv/mymapper.sh,/home/sanjiv/myreducer.sh -input indir -output outdir -mapper mymapper.sh -reducer myreducer.sh -partitioner Mypartitioner

I have a custom partitioner like below:
import java.util.*; import org.apache.hadoop.mapreduce.*; public static class SignaturePartitioner extends Partitioner<Text,Text> { @Override public int getPartition(Text key,Text value,int numReduceTasks) { return (key.toString().Split(' ')[0].hashCode() & Integer.MAX_VALUE) % numReduceTasks; } }
I set the hadoop streaming parameter like below
-file SignaturePartitioner.java \ -partitioner SignaturePartitioner \
Then I get an error: Class Not Found.

Do you know what's the problem?

Best Regards,
解决方案
I faced the same issue, but managed to solve after lot of research.

Root cause is streaming-2.6.0.jar uses mapred api and not mapreduce api. Also, implement Partitioner interface, and not extend Partitioner class. The following worked for me:
import java.io.IOException; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.Partitioner; import org.apache.hadoop.mapred.JobConf;` public class Mypartitioner implements Partitioner<Text, Text> {` public void configure(JobConf job) {} public int getPartition(Text pkey, Text pvalue, int pnumparts) { if (pkey.toString().startsWith("a")) return 0; else return 1 ; } }
compile Mypartitioner, create jar, and then,
bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -libjars /home/sanjiv/hadoop-2.6.0/Mypartitioner.jar -D mapreduce.job.reduces=2 -files /home/sanjiv/mymapper.sh,/home/sanjiv/myreducer.sh -input indir -output outdir -mapper mymapper.sh -reducer myreducer.sh -partitioner Mypartitioner

这篇关于如何为hadoop streaming指定分区器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何为hadoop streaming指定分区器 [英] How to specify the partitioner for hadoop streaming

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

如何为hadoop streaming指定分区器 [英] How to specify the partitioner for hadoop streaming

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭