为什么我的任务只有一个工人来完成的星火集群 [英] Why my tasks only be done in one worker in Spark cluster
问题描述
我建立一个火花集群与主和2站(主从设备之一,在同一台机器上)。我修改单词计数的例子,以便它可以输出时mapToPair()被调用的一些消息。我提交给主人。但是,只有一个工人在标准输出输出。是指只有一个工作做任务?我的每一个员工都有一个核心。我ALSE尽量让1000片在文本文件(),但还是不行。如何使这两个工人完成的任务?我做一些错误呢?
SparkConf sparkConf =新SparkConf()setAppName(ORSIFTask).setMaster(火花://192.168.0.110:7077);
JavaSparkContext CTX =新JavaSparkContext(sparkConf);
ctx.addJar(/家庭/ Hadoop的/ ONT-1.0-SNAPSHOT.jar);
JavaRDD<串GT;线= ctx.textFile(HDFS://192.168.0.110:9000 / features4.data,2).cache();
JavaRDD<串GT;字= lines.flatMap(新FlatMapFunction<字符串,字符串>(){
@覆盖
公众可迭代<串GT;调用(String s)将{ 返回Arrays.asList(SPACE.split(S));
}
});JavaPairRDD<字符串,字符串>那些= words.mapToPair(新PairFunction<字符串,字符串,字符串>(){
@覆盖
公共Tuple2<字符串,字符串>调用(String s)将抛出OWLOntologyCreationException {
的System.out.println(图:+ S);
返回新Tuple2<字符串,字符串>(S,物);
}
});JavaPairRDD<字符串,字符串>数= ones.reduceByKey(新功能2<字符串,字符串,字符串>(){
@覆盖
公共字符串调用(串I1,I2字符串){
的System.out.println(减少+ I1);
返回I1;
}
});清单< Tuple2<字符串,字符串>>输出= counts.collect();
1)检查 SPARK_HOME / conf目录/从
文件或主WebUI中所有的奴隶是否被列出。结果
2)哪些cluste模式是您使用?可能的println在驱动程序中给予的输出。结果
3)RDD可能没有足够的分区。结果
4)在工人UI检查而作业正在执行的遗嘱执行人是否已启动。结果
5)增加默认的数据并行化和检查。结果
I build a spark cluster with a master and 2 slaves( one of the slave and master are in same machine).I modify wordcount example so that it can output some message when mapToPair() is called. I submit it to master. But only one worker has output in stdout. Is that mean only one Work do the task? Each of my worker has one core. I alse try to make 1000 slice in textFile() but still not work . How to make both worker do tasks? Do i make some mistake?
SparkConf sparkConf = new SparkConf().setAppName("ORSIFTask").setMaster("spark://192.168.0.110:7077");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
ctx.addJar("/home/hadoop/ont-1.0-SNAPSHOT.jar");
JavaRDD<String> lines = ctx.textFile("hdfs://192.168.0.110:9000/features4.data",2).cache();
JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
@Override
public Iterable<String> call(String s) {
return Arrays.asList(SPACE.split(s));
}
});
JavaPairRDD<String, String> ones = words.mapToPair(new PairFunction<String, String, String>() {
@Override
public Tuple2<String, String> call(String s) throws OWLOntologyCreationException {
System.out.println("map:"+s);
return new Tuple2<String, String>(s, "thing");
}
});
JavaPairRDD<String, String> counts = ones.reduceByKey(new Function2<String, String,String>() {
@Override
public String call(String i1, String i2) {
System.out.println("reduce:"+i1);
return i1;
}
});
List<Tuple2<String, String>> output = counts.collect();
1) Check SPARK_HOME/conf/slave
file or the master WebUI whether all the slaves are listed.
2) Which cluste mode are you using?Println might give output in the driver.
3) The RDD may not have enough partitions.
4) Check in the worker UI whether the executors are started while job is executing .
5)increase the default data parallelization and check.
这篇关于为什么我的任务只有一个工人来完成的星火集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!