来自Python的Kubernetes集群上的Spark提交(2.3) [英] Spark submit (2.3) on kubernetes cluster from Python

查看:92
本文介绍了来自Python的Kubernetes集群上的Spark提交(2.3)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在k8s已直接与spark集成在2.3中,我从控制台提交的spark在kuberenetes master上可以正确执行,而无需运行任何spark master吊舱,spark可以处理所有k8s的详细信息:

So now that k8s is integrated directly with spark in 2.3 my spark submit from the console executes correctly on a kuberenetes master without any spark master pods running, spark handles all the k8s details:

spark-submit \
  --deploy-mode cluster \
  --class com.app.myApp \
  --master k8s://https://myCluster.com \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --conf spark.app.name=myApp \
  --conf spark.executor.instances=10 \
  --conf spark.kubernetes.container.image=myImage \
  local:///myJar.jar

我想做的是通过AWS lambda向我的k8s集群执行火花提交.以前,我直接通过spark master REST API使用了该命令(没有kubernetes):

What I am trying to do is do a spark-submit via AWS lambda to my k8s cluster. Previously I used the command via the spark master REST API directly (without kubernetes):

request = requests.Request(
    'POST',
    "http://<master-ip>:6066/v1/submissions/create",
    data=json.dumps(parameters))
prepared = request.prepare()
session = requests.Session()
response = session.send(prepared)

它奏效了.现在,我想集成Kubernetes,并以类似的方式进行操作,即我从python向我的kubernetes集群提交API请求,并让spark处理所有k8s细节,理想情况是:

And it worked. Now I want to integrate Kubernetes and do it similarly where I submit an API request to my kubernetes cluster from python and have spark handle all the k8s details, ideally something like:

request = requests.Request(
    'POST',
    "k8s://https://myK8scluster.com:443",
    data=json.dumps(parameters))

Spark 2.3/Kubernetes集成是否可能?

Is it possible in the Spark 2.3/Kubernetes integration?

推荐答案

如果您使用本地Kubernetes支持,恐怕对于Spark 2.3来说这是不可能的.

I afraid that is impossible for Spark 2.3, if you using native Kubernetes support.

基于部署说明中的描述,提交过程容器的几个步骤:

Based on description from deployment instruction, submission process container several steps:

  1. Spark创建一个在Kubernetes容器中运行的Spark驱动程序.
  2. 驱动程序会创建执行程序,这些执行程序也将在Kubernetes容器内运行并连接到它们,并执行应用程序代码.
  3. 应用程序完成后,执行程序pod终止并被清理,但是驱动程序pod保留日志,并在Kubernetes API中保持完成"状态,直到最终对其进行垃圾收集或手动清理为止.

因此,实际上,在启动提交过程之前,您无处提交任务,这将为您启动第一个Spark的pod(驱动程序).申请完成后,一切都终止了.

So, in fact, you have no place to submit a job until you starting a submission process, which will launch a first Spark's pod (driver) for you. And after application completes, everything terminated.

因为在AWS Lambda上运行胖容器不是最佳解决方案,并且还因为如果无法在容器本身中运行任何命令(可能的话,但是在使用hack的情况下,这里是

Because of running a fat container on AWS Lambda is not a best solution, and also because if is not way to run any commands in container itself (is is possible, but with hack, here is blueprint about executing Bash inside an AWS Lambda) the simplest way is to write some small custom service, which will work on machine outside of AWS Lambda and provide REST interface between your application and spark-submit utility. I don't see any other ways to make it without a pain.

这篇关于来自Python的Kubernetes集群上的Spark提交(2.3)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆