//纱和火花:使用HDFS的区别 [英] Difference between using hdfs:// and yarn in spark
问题描述
什么是使用HDFS的区别://和火花纱保存和载入保存文件,群集模式
What is difference between using hdfs:// and yarn in spark to save and load save file with cluster mode?
推荐答案
从你的问题<一href=\"http://stackoverflow.com/questions/36257648/using-other-hdfs-server-from-spark-cluster\">here,我显然猜你对HDFS和纱线的理解不正确。
From your question here, I apparently guess your understanding on HDFS and YARN is incorrect.
YARN是一个通用的作业调度框架,HDFS是一个存储架构。
YARN is a generic job scheduling framework and HDFS is a storage framework.
YARN在坚果壳具有硕士(资源管理器)和职工(节点管理器),
YARN in a nut shell has a master(Resource Manager) and workers(Node manager),
资源管理器对工人造成的容器来执行麻preduce工作,火花的作业等。
The resource manager creates containers on workers to execute MapReduce jobs, spark jobs etc.
HDFS,另一方面具有硕士(名称节点)和工人(数据节点),以保持和检索文件。
HDFS on the other hand has a master(Name Node) and worker(Data Node) to persist and retrieve files.
您不必纱HDFS进行沟通,这是一个独立的实体。
You don't need YARN to communicate with HDFS, it is an independent entity.
在生产环境中HDFS工人(数据节点)和纱线工人(节点管理器)安装在一台机器上,这样的处理框架可以消耗从最近的本地数据节点(数据局部性)。数据
In production environment HDFS worker(Data node) and YARN worker(Node manager) are installed in a single machine so that the processing framework can consume the data from the nearest local data node(Data Locality).
在集群模式下的纱线集群上使用的火花表示纱线集群作为客户机中的工作节点之一提交火花的工作。
Using spark on a YARN cluster in cluster mode means one of the worker nodes within the YARN cluster acts as client to submit the spark job.
因此使用HDFS://将明显受益的火花工作作为火花遗嘱执行人会读从最近的数据节点的数据。
Hence using hdfs:// would obviously benefit the spark job as the spark executor would read the data from the nearest data node.
纱线和HDFS配置将从HADOOP_CONF_DIR在客户机上读取(可以是你的本地计算机中的客户端模式,并在集群模式下的工作节点之一)。
The YARN and HDFS configurations would be read from HADOOP_CONF_DIR on the client machine(can be you local machine in client mode and one of the worker nodes in cluster mode).
这篇关于//纱和火花:使用HDFS的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!