关于AWS EMR Spark上Jupyterhub的Livy会话 [英] About Livy session for Jupyterhub on AWS EMR Spark

查看:216
本文介绍了关于AWS EMR Spark上Jupyterhub的Livy会话的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的客户在AWS EMR上的Jupyterhub上配置了AD连接器,以便可以通过AD在jupyterhub上对不同的用户进行身份验证.当前的理解是,当不同的用户通过Jupyterhub上的Jupyter笔记本向共享的基础EMR火花引擎提交其火花作业时,该火花作业将通过Livy提交给火花引擎.每个Livy会话都会映射到一个相关的spark会话(这是我目前的理解,如果我错了,请纠正我)

My customer has a AD connector configured on Jupyterhub installed on AWS EMR so that different users will be authenticated on jupyterhub via AD. The current understanding is when different users submit their spark job through Jupyter notebook on Jupyterhub to the shared underlying EMR spark engine, the spark job will be submitted via Livy to spark engine. Each Livy session will has a related spark session mapped to it(that is my current understanding and correct me if I am wrong)

问题是,不同的Jupyterhub用户将共享相同 Livy会话(然后是不同火花会话)还是不同的Livy会话(然后是不同的Spark会话)?

The question is whether different Jupyterhub user will share the same Livy session (then different spark session) or different Livy session (then different spark session)?

我只能找到的有限材料是:

The only limited material I can find is:

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub.html

在此处查看此拱形图片

非常感谢!

推荐答案

据我所知(在HDP发行版上测试),默认情况下,Livy服务器将创建不同的Spark驱动程序,因此为每个用户创建不同的会话.该服务器可通过kerberized HTTP接口访问,因此用户必须提供有效的票证,并且相应的会话将以他的名字运行.因为在这种情况下,用户将可以访问自己的资源(数据,YARN队列等),因此这似乎是可行的方法.在这种情况下, livy服务器模拟了用户,它像用户一样运行Spark作业(请参阅

As far as I know (tested on an HDP distribution) by default the Livy server will create a different Spark driver and so a different sessions for each user. The server is reachable through a kerberized HTTP interface, so the user has to come with a valid ticket and the corresponding session will be run under his name. It seems to be the way to go since, in this case, the user will have access to his own resources (data, YARN queue and so on). In this case, the livy server impersonates the user, it runs a Spark job as if it were the user (see Granting Livy the Ability to Impersonate.

通过签入文档我已经看到,您可以使用与EMR中的Livy服务器完全相同的方式进行配置.

By checking in the doc I've seen that you can configure exactly in the same way the Livy server in EMR.

默认情况下,以这种方式提交的YARN作业以livy用户身份运行,而不管发起该作业的用户如何.通过设置用户模拟,您可以让笔记本用户的用户ID成为与YARN作业关联的用户. 每个用户发起的工作都与shirley和diego相关联,而不是让shirley和diego发起的作业都与用户livy相关联.这可以帮助您审核Jupyter的使用情况并管理组织内的应用程序.

By default, YARN jobs submitted this way run as user livy, regardless of the user who initiated the job. By setting up user impersonation you can have the user ID of the notebook user also be the user associated with the YARN job. Rather than having jobs initiated by both shirley and diego associated with the user livy, jobs that each user initiates are associated with shirley and diego respectively. This helps you to audit Jupyter usage and manage applications within your organization.

因此,您可以选择使用模拟(作为独立用户运行)还是不使用(作为单个livy用户运行).

So you have the choice to use impersonation (run as distinct users) or not (run as a single livy user).

这篇关于关于AWS EMR Spark上Jupyterhub的Livy会话的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆