Spark:检查您的集群用户界面以确保工作人员已注册 [英] Spark : check your cluster UI to ensure that workers are registered
问题描述
我在Spark中有一个简单的程序:
/ * SimpleApp.scala * /
import org.apache .spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
对象SimpleApp {
def main(args: Array(String)){
val conf = new SparkConf()。setMaster(spark://10.250.7.117:7077).setAppName(Simple Application).set(spark.cores.max ,2)
val sc = new SparkContext(conf)
val ratingsFile = sc.textFile(hdfs:// hostname:8020 / user / hdfs / mydata / movieLens / ds_small / ratings.csv )
//首先获取前10条记录
println(获取前10条记录:)
ratingsFile.take(10)
//获取电影评级文件中的记录数
println(电影列表中的记录数为:)
ratingsFile.count()
}
}
当我尝试从spark-shell运行这个程序时,即我登录到名称no de(Cloudera安装)并在spark-shell上按顺序运行命令:
val ratingsFile = sc.textFile(hdfs: //主机名:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv)
println(获取前10条记录:)
ratingsFile.take(10)
println(电影列表中的记录数是:)
ratingsFile.count()
我得到了正确的结果,但是如果我尝试从excel运行程序,则没有资源被分配给程序,并且在控制台日志中我看到的全部是:
WARN TaskSchedulerImpl:初始作业未接受任何资源;检查你的集群用户界面,以确保工作人员已经注册并拥有足够的资源
另外,在Spark用户界面中,我看到这个:
另外,应该注意的是,这个版本的spark与Cloudera一起安装(因此没有worker节点出现)。
我应该怎么做这项工作?
编辑:
我检查了HistoryServer和这些工作don (即使在不完整的应用程序中)
我已经完成了许多spark集群的配置和性能调整,这是这是一个很常见的/普通的消息,用于查看您何时首次准备/配置集群以处理工作负载。
由于没有足够的资源启动工作,因此很明确。该工作正在请求下列其中一项:
- 每位工作人员的内存超过分配给它的空间(1GB)
- 集群上可用的CPU数量多于
I have a simple program in Spark:
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setMaster("spark://10.250.7.117:7077").setAppName("Simple Application").set("spark.cores.max","2")
val sc = new SparkContext(conf)
val ratingsFile = sc.textFile("hdfs://hostname:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv")
//first get the first 10 records
println("Getting the first 10 records: ")
ratingsFile.take(10)
//get the number of records in the movie ratings file
println("The number of records in the movie list are : ")
ratingsFile.count()
}
}
When I try to run this program from the spark-shell i.e. I log into the name node (Cloudera installation) and run the commands sequentially on the spark-shell:
val ratingsFile = sc.textFile("hdfs://hostname:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv")
println("Getting the first 10 records: ")
ratingsFile.take(10)
println("The number of records in the movie list are : ")
ratingsFile.count()
I get correct results, but if I try to run the program from excel, no resources are assigned to program and in the console log all I see is:
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Also, in the Spark UI, I see this:
Also, it should be noted that this version of spark was installed with Cloudera (hence no worker nodes show up).
What should I do to make this work?
EDIT:
I checked the HistoryServer and these jobs don't show up there (even in incomplete applications)
I have done configuration and performance tuning for many spark clusters and this is a very common/normal message to see when you are first prepping/configuring a cluster to handle your workloads.
This is unequivocally due to insufficient resources to have the job launched. The job is requesting one of:
- more memory per worker than allocated to it (1GB)
- more CPU's than available on the cluster
这篇关于Spark:检查您的集群用户界面以确保工作人员已注册的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!