Spark：检查您的集群用户界面以确保工作人员已注册 [英] Spark : check your cluster UI to ensure that workers are registered

查看：336 发布时间：2018/5/31 19:51:20 scala hadoop apache-spark cloudera cloudera-manager

本文介绍了Spark：检查您的集群用户界面以确保工作人员已注册的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Spark中有一个简单的程序：

  / * SimpleApp.scala * / 
 import org.apache .spark.SparkContext 
 import org.apache.spark.SparkContext._ 
 import org.apache.spark.SparkConf 
 
对象SimpleApp {
 def main（args： Array（String））{
 val conf = new SparkConf（）。setMaster（spark：//10.250.7.117：7077）.setAppName（Simple Application）.set（spark.cores.max ，2）
 val sc = new SparkContext（conf）
 val ratingsFile = sc.textFile（hdfs：// hostname：8020 / user / hdfs / mydata / movieLens / ds_small / ratings.csv ）
 
 //首先获取前10条记录
 println（获取前10条记录：）
 ratingsFile.take（10）
 
 //获取电影评级文件中的记录数
 println（电影列表中的记录数为：）
 ratingsFile.count（）
} 
 }

当我尝试从spark-shell运行这个程序时，即我登录到名称no de（Cloudera安装）并在spark-shell上按顺序运行命令：

  val ratingsFile = sc.textFile（hdfs： //主机名：8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv）
 println（获取前10条记录：）
 ratingsFile.take（10）
 println（电影列表中的记录数是：）
 ratingsFile.count（）

我得到了正确的结果，但是如果我尝试从excel运行程序，则没有资源被分配给程序，并且在控制台日志中我看到的全部是：

WARN TaskSchedulerImpl：初始作业未接受任何资源;检查你的集群用户界面，以确保工作人员已经注册并拥有足够的资源

另外，在Spark用户界面中，我看到这个：

作业持续运行 - Spark

另外，应该注意的是，这个版本的spark与Cloudera一起安装（因此没有worker节点出现）。

我应该怎么做这项工作？

编辑：

我检查了HistoryServer和这些工作don （即使在不完整的应用程序中）
解决方案
我已经完成了许多spark集群的配置和性能调整，这是这是一个很常见的/普通的消息，用于查看您何时首次准备/配置集群以处理工作负载。

由于没有足够的资源启动工作，因此很明确。该工作正在请求下列其中一项：

每位工作人员的内存超过分配给它的空间（1GB）

集群上可用的CPU数量多于

I have a simple program in Spark:
/* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setMaster("spark://10.250.7.117:7077").setAppName("Simple Application").set("spark.cores.max","2") val sc = new SparkContext(conf) val ratingsFile = sc.textFile("hdfs://hostname:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv") //first get the first 10 records println("Getting the first 10 records: ") ratingsFile.take(10) //get the number of records in the movie ratings file println("The number of records in the movie list are : ") ratingsFile.count() } }
When I try to run this program from the spark-shell i.e. I log into the name node (Cloudera installation) and run the commands sequentially on the spark-shell:
val ratingsFile = sc.textFile("hdfs://hostname:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv") println("Getting the first 10 records: ") ratingsFile.take(10) println("The number of records in the movie list are : ") ratingsFile.count()
I get correct results, but if I try to run the program from excel, no resources are assigned to program and in the console log all I see is:

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Also, in the Spark UI, I see this:

Job keeps Running - Spark

Also, it should be noted that this version of spark was installed with Cloudera (hence no worker nodes show up).

What should I do to make this work?

EDIT:

I checked the HistoryServer and these jobs don't show up there (even in incomplete applications)
解决方案
I have done configuration and performance tuning for many spark clusters and this is a very common/normal message to see when you are first prepping/configuring a cluster to handle your workloads.

This is unequivocally due to insufficient resources to have the job launched. The job is requesting one of:

more memory per worker than allocated to it (1GB)

more CPU's than available on the cluster

这篇关于Spark：检查您的集群用户界面以确保工作人员已注册的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark：检查您的集群用户界面以确保工作人员已注册 [英] Spark : check your cluster UI to ensure that workers are registered

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Spark：检查您的集群用户界面以确保工作人员已注册 [英] Spark : check your cluster UI to ensure that workers are registered

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭