Spark:检查您的集群用户界面以确保工作人员已注册 [英] Spark : check your cluster UI to ensure that workers are registered

查看:336
本文介绍了Spark:检查您的集群用户界面以确保工作人员已注册的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Spark中有一个简单的程序:

  / * SimpleApp.scala * / 
import org.apache .spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

对象SimpleApp {
def main(args: Array(String)){
val conf = new SparkConf()。setMaster(spark://10.250.7.117:7077).setAppName(Simple Application).set(spark.cores.max ,2)
val sc = new SparkContext(conf)
val ratingsFile = sc.textFile(hdfs:// hostname:8020 / user / hdfs / mydata / movieLens / ds_small / ratings.csv )

//首先获取前10条记录
println(获取前10条记录:)
ratingsFile.take(10)

//获取电影评级文件中的记录数
println(电影列表中的记录数为:)
ratingsFile.count()
}
}

当我尝试从spark-shell运行这个程序时,即我登录到名称no de(Cloudera安装)并在spark-shell上按顺序运行命令:

  val ratingsFile = sc.textFile(hdfs: //主机名:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv)
println(获取前10条记录:)
ratingsFile.take(10)
println(电影列表中的记录数是:)
ratingsFile.count()

我得到了正确的结果,但是如果我尝试从excel运行程序,则没有资源被分配给程序,并且在控制台日志中我看到的全部是:

WARN TaskSchedulerImpl:初始作业未接受任何资源;检查你的集群用户界面,以确保工作人员已经注册并拥有足够的资源



另外,在Spark用户界面中,我看到这个:



作业持续运行 - Spark



另外,应该注意的是,这个版本的spark与Cloudera一起安装(因此没有worker节点出现)。

我应该怎么做这项工作?



编辑:



我检查了HistoryServer和这些工作don (即使在不完整的应用程序中)

解决方案

我已经完成了许多spark集群的配置和性能调整,这是这是一个很常见的/普通的消息,用于查看您何时首次准备/配置集群以处理工作负载。



由于没有足够的资源启动工作,因此很明确。该工作正在请求下列其中一项:


  • 每位工作人员的内存超过分配给它的空间(1GB)

  • 集群上可用的CPU数量多于

I have a simple program in Spark:

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("spark://10.250.7.117:7077").setAppName("Simple Application").set("spark.cores.max","2")
    val sc = new SparkContext(conf)    
    val ratingsFile = sc.textFile("hdfs://hostname:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv")

    //first get the first 10 records 
    println("Getting the first 10 records: ")
    ratingsFile.take(10)    

    //get the number of records in the movie ratings file
    println("The number of records in the movie list are : ")
    ratingsFile.count() 
  }
}

When I try to run this program from the spark-shell i.e. I log into the name node (Cloudera installation) and run the commands sequentially on the spark-shell:

val ratingsFile = sc.textFile("hdfs://hostname:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv")
println("Getting the first 10 records: ")
ratingsFile.take(10)    
println("The number of records in the movie list are : ")
ratingsFile.count() 

I get correct results, but if I try to run the program from excel, no resources are assigned to program and in the console log all I see is:

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Also, in the Spark UI, I see this:

Job keeps Running - Spark

Also, it should be noted that this version of spark was installed with Cloudera (hence no worker nodes show up).

What should I do to make this work?

EDIT:

I checked the HistoryServer and these jobs don't show up there (even in incomplete applications)

解决方案

I have done configuration and performance tuning for many spark clusters and this is a very common/normal message to see when you are first prepping/configuring a cluster to handle your workloads.

This is unequivocally due to insufficient resources to have the job launched. The job is requesting one of:

  • more memory per worker than allocated to it (1GB)
  • more CPU's than available on the cluster

这篇关于Spark:检查您的集群用户界面以确保工作人员已注册的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆