在不安全的YARN群集中运行Spark时访问受保护的Hive [英] Access a secured Hive when running Spark in an unsecured YARN cluster

查看:180
本文介绍了在不安全的YARN群集中运行Spark时访问受保护的Hive的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有两个cloudera 5.7.1群集,一个使用Kerberos进行安全保护,另一个使用不安全保护。



是否有可能在访问存储在安全集群中的配置单元表时使用不安全的YARN集群运行Spark? (Spark版本为1.6)



如果是这样,你能否提供一些解释我如何配置它?



更新:



我想解释一下我的问题背后的最终目标。我们的主要安全集群被大量使用,我们的工作无法获得足够的资源在合理的时间内完成。为了克服这个问题,我们希望使用另一个不安全的集群中的资源,这些集群中没有需要在集群之间复制数据。

我们知道这不是最好的解决方案,因为数据本地化级别可能不是最佳的,但这是我们现在可以提出的最佳解决方案。



请让我知道您是否有其他解决方案,因为看起来我们无法达到上述目的。

解决方案

如果您在本地模式中运行Spark,则可以使其使用任意一组Hadoop配置文件 - 即 core- site.xml hdfs-site.xml mapred-site.xml ,<从Kerberized群集复制的code> yarn-site.xml , hive-site.xml 。因此,您可以访问HDFS 群集 - 如果您拥有Kerberos票据,当然可以授予您访问该群集的权限。

  export HADOOP_CONF_DIR = / path / to / conf / of / remote / kerberized / cluster 
kinit sylvestre@WORLD.COMPANY
spark-shell --master local [*]

但是,在 yarn-client或yarn-cluster模式中,您无法在本地群集中启动容器,访问另一个HDFS。




  • 要么使用本地核心-site.xml 表示 hadoop.security.authentication 简单,而你可以连接到本地YARN / HDFS

  • ,或者您指向远程 core-site.xml 的副本,它表示 hadoop.security.authentication kerberos ,并且您可以连接到远程YARN / HDFS
  • 但不能使用本地的,不安全的YARN并访问远程的安全HDFS



请注意,对于不安全的不安全或安全的HDFS,安全组合,您可以 访问另一个集群中的HDFS,通过攻击自己的自定义 hdfs-site.xml 来定义多个名称空间。但您坚持使用单一身份验证模式。

> 请参阅Mighty Steve Loughran关于额外Spark属性的评论,以从本地安全集群访问远程,安全的HDFS



请注意,使用DistCp时,您的方式也是一样 - 除了有一个作弊属性,可以让您从安全转到不安全。

We have two cloudera 5.7.1 clusters, one secured using Kerberos and one unsecured.

Is it possible to run Spark using the unsecured YARN cluster while accessing hive tables stored in the secured cluster? (Spark version is 1.6)

If so, can you please provide some explanation on how can I get it configured?

Update:

I want to explain a little the end goal behind my question. Our main secured cluster is heavily utilized and our jobs can't get enough resources to complete in a reasonable time. In order to overcome this, we wanted to use resources from another unsecured cluster we have without needing to copy the data between the clusters.

We know it's not the best solution as the data locality level might not be optimal, however that's the best solution we can come up for now.

Please let me know if you have any other solution as it seems like we can't achieve the above.

解决方案

If you run Spark in local mode, you can make it use an arbitrary set of Hadoop conf files -- i.e. core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, hive-site.xml copied from the Kerberized cluster.
So you can access HDFS on that cluster -- if you have a Kerberos ticket that grants you access to that cluster, of course.

  export HADOOP_CONF_DIR=/path/to/conf/of/remote/kerberized/cluster
  kinit sylvestre@WORLD.COMPANY
  spark-shell --master local[*]

But in yarn-client or yarn-cluster mode, you cannot launch containers in the local cluster and access HDFS in the other.

  • either you use the local core-site.xml that says that hadoop.security.authentication is simple, and you can connect to local YARN/HDFS
  • or you point to a copy of the remote core-site.xml that says that hadoop.security.authentication is kerberos, and you can connect to remote YARN/HDFS
  • but you cannot use the local, unsecure YARN and access the remote, secure HDFS

Note that with unsecure-unsecure or secure-secure combinations, you could access HDFS in another cluster, by hacking your own custom hdfs-site.xml to define multiple namespaces. But you are stuck to a single authentication model.
[edit] see the comment by Mighty Steve Loughran about an extra Spark property to access remote, secure HDFS from a local, secure cluster.

Note also that with DistCp you are stuck the same way -- except that there's a "cheat" property that allows you to go from secure to unsecure.

这篇关于在不安全的YARN群集中运行Spark时访问受保护的Hive的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆