在不安全的 YARN 集群中运行 Spark 时访问安全的 Hive [英] Access a secured Hive when running Spark in an unsecured YARN cluster

查看:29
本文介绍了在不安全的 YARN 集群中运行 Spark 时访问安全的 Hive的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有两个 cloudera 5.7.1 集群,一个使用 Kerberos 保护,一个不安全.

We have two cloudera 5.7.1 clusters, one secured using Kerberos and one unsecured.

在访问存储在安全集群中的 hive 表时,是否可以使用不安全的 YARN 集群运行 Spark?(Spark 版本是 1.6)

Is it possible to run Spark using the unsecured YARN cluster while accessing hive tables stored in the secured cluster? (Spark version is 1.6)

如果是这样,能否请您解释一下如何配置它?

If so, can you please provide some explanation on how can I get it configured?

更新:

我想稍微解释一下我的问题背后的最终目标.我们的主要安全集群被大量使用,我们的工作无法在合理的时间内获得足够的资源来完成.为了克服这个问题,我们希望使用来自我们拥有的另一个不安全集群的资源,无需在集群之间复制数据.

I want to explain a little the end goal behind my question. Our main secured cluster is heavily utilized and our jobs can't get enough resources to complete in a reasonable time. In order to overcome this, we wanted to use resources from another unsecured cluster we have without needing to copy the data between the clusters.

我们知道这不是最佳解决方案,因为数据局部性级别可能不是最佳的,但这是我们目前可以提出的最佳解决方案.

We know it's not the best solution as the data locality level might not be optimal, however that's the best solution we can come up for now.

如果您有任何其他解决方案,请告诉我,因为我们似乎无法实现上述目标.

Please let me know if you have any other solution as it seems like we can't achieve the above.

推荐答案

如果你在本地模式下运行Spark,你可以让它使用任意一组Hadoop conf文件——即core-site.xmlhdfs-site.xmlmapred-site.xmlyarn-site.xmlhive-site.xml 从 Kerberized 集群复制.
因此您可以访问 那个 集群上的 HDFS -- 如果您有 Kerberos 票证授予您访问该集群的权限,当然.

If you run Spark in local mode, you can make it use an arbitrary set of Hadoop conf files -- i.e. core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, hive-site.xml copied from the Kerberized cluster.
So you can access HDFS on that cluster -- if you have a Kerberos ticket that grants you access to that cluster, of course.

  export HADOOP_CONF_DIR=/path/to/conf/of/remote/kerberized/cluster
  kinit sylvestre@WORLD.COMPANY
  spark-shell --master local[*]

但是在yarn-client 或yarn-cluster 模式,你不能在本地集群中启动容器并在另一个集群中访问HDFS.

But in yarn-client or yarn-cluster mode, you cannot launch containers in the local cluster and access HDFS in the other.

  • 或者您使用本地 core-site.xml 表示 hadoop.security.authenticationsimple,并且您可以连接到本地 YARN/HDFS
  • 或者您指向远程 core-site.xml 的副本,该副本表示 hadoop.security.authenticationkerberos,并且您可以连接到远程 YARN/HDFS
  • 但是您不能使用本地的、不安全的 YARN 并访问远程、安全的 HDFS
  • either you use the local core-site.xml that says that hadoop.security.authentication is simple, and you can connect to local YARN/HDFS
  • or you point to a copy of the remote core-site.xml that says that hadoop.security.authentication is kerberos, and you can connect to remote YARN/HDFS
  • but you cannot use the local, unsecure YARN and access the remote, secure HDFS

请注意,使用不安全-不安全或安全-安全组合,您可以访问另一个集群中的 HDFS,方法是破解您自己的自定义 hdfs-site.xml 以定义多个命名空间.但是您坚持使用单一身份验证模型.
请参阅 Mighty Steve Loughran 关于额外 Spark 属性的评论,用于从本地安全集群访问远程安全 HDFS.

Note that with unsecure-unsecure or secure-secure combinations, you could access HDFS in another cluster, by hacking your own custom hdfs-site.xml to define multiple namespaces. But you are stuck to a single authentication model.
[edit] see the comment by Mighty Steve Loughran about an extra Spark property to access remote, secure HDFS from a local, secure cluster.

另请注意,使用 DistCp 时,您会遇到同样的问题——除了有一个作弊"属性可以让您从安全变为不安全.

Note also that with DistCp you are stuck the same way -- except that there's a "cheat" property that allows you to go from secure to unsecure.

这篇关于在不安全的 YARN 集群中运行 Spark 时访问安全的 Hive的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆