我可以在集群模式下运行dataproc作业吗 [英] Can I run dataproc jobs in cluster mode
问题描述
刚刚开始熟悉GCP dataproc.我已经注意到,当我使用gcloud dataproc jobs submit pyspark
时,作业是通过spark.submit.deployMode=client
提交的. spark.submit.deployMode=cluster
对我们来说是一个选择吗?
Just starting to get familiar with GCP dataproc. I've noticed when I use gcloud dataproc jobs submit pyspark
that jobs are submitted with spark.submit.deployMode=client
. Is spark.submit.deployMode=cluster
an option for us?
推荐答案
是的,可以通过指定--properties spark.submit.deployMode=cluster
来进行.只需注意驱动程序输出将在yarn用户日志中(您可以从控制台的Stackdriver Logging中访问它们).默认情况下,我们在客户端模式下运行,以将驱动程序输出流式传输给您.
Yes, you can, by specifying --properties spark.submit.deployMode=cluster
. Just note that driver output will be in yarn userlogs (you can access them in Stackdriver Logging from the Console). We run in client mode by default to stream driver output to you.
这篇关于我可以在集群模式下运行dataproc作业吗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!