Google Cloud Dataproc无法使用初始化脚本创建新集群 [英] Google cloud dataproc failing to create new cluster with initialization scripts
问题描述
我正在使用以下命令创建数据处理集群:
I am using the below command to create data proc cluster:
gcloud dataproc集群创建notifyetis-dev --initialization-actions"gs://dataproc-initialization-actions/jupyter/jupyter.sh,gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh,gs://dataproc-initialization-actions/hue/hue.sh,gs://dataproc-initialization-actions/ipython-notebook/ipython.sh,gs://dataproc-initialization-actions/tez/tez.sh,gs://dataproc-initialization-actions/oozie/oozie.sh,gs://dataproc-initialization-actions/zeppelin/zeppelin.sh,gs://dataproc-initialization-actions/user-environment/user-environment.sh,gs: //dataproc-initialization-actions/list-consistency-cache/shared-list-consistency-cache.sh,gs://dataproc-initialization-actions/kafka/kafka.sh,gs://dataproc-initialization-actions/ganglia/ganglia.sh,gs://dataproc-initialization-actions/flink/flink.sh" --image-version 1.1 --master-boot-disk-size 100GB --master-machine-type n1-standard-1 --metadata"hive-metastore-instance = g-test-1022:asia-east1:db_instance" --num-preemptible-workers 2 --num-workers 2 --preemptible-worker-启动磁盘大小1TB-属性hive:hive.metastore.warehouse.dir = gs://informetis-dev/hive-warehouse --worker-machine-type n1-standard-2 --zone asia-east1-b --bucket info-dev
gcloud dataproc clusters create informetis-dev --initialization-actions "gs://dataproc-initialization-actions/jupyter/jupyter.sh,gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh,gs://dataproc-initialization-actions/hue/hue.sh,gs://dataproc-initialization-actions/ipython-notebook/ipython.sh,gs://dataproc-initialization-actions/tez/tez.sh,gs://dataproc-initialization-actions/oozie/oozie.sh,gs://dataproc-initialization-actions/zeppelin/zeppelin.sh,gs://dataproc-initialization-actions/user-environment/user-environment.sh,gs://dataproc-initialization-actions/list-consistency-cache/shared-list-consistency-cache.sh,gs://dataproc-initialization-actions/kafka/kafka.sh,gs://dataproc-initialization-actions/ganglia/ganglia.sh,gs://dataproc-initialization-actions/flink/flink.sh" --image-version 1.1 --master-boot-disk-size 100GB --master-machine-type n1-standard-1 --metadata "hive-metastore-instance=g-test-1022:asia-east1:db_instance" --num-preemptible-workers 2 --num-workers 2 --preemptible-worker-boot-disk-size 1TB --properties hive:hive.metastore.warehouse.dir=gs://informetis-dev/hive-warehouse --worker-machine-type n1-standard-2 --zone asia-east1-b --bucket info-dev
但是Dataproc无法创建群集,并在故障文件中出现以下错误:
But Dataproc failed to create cluster with following errors in failure file:
猫 + mysql -u hive -phive-password -e''错误2003(HY000):无法连接到'localhost'上的MySQL服务器(111) + mysql -e'创建用户'\'hive'\'由'\'hive-password'\''标识;'错误2003(HY000):无法连接到MySQL 本地主机(111)上的服务器
cat + mysql -u hive -phive-password -e '' ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (111) + mysql -e 'CREATE USER '\''hive'\'' IDENTIFIED BY '\''hive-password'\'';' ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (111)
有人对此失败有任何想法吗?
Does anyone have any idea behind this failure ?
推荐答案
It looks like you're missing the --scopes sql-admin
flag as described in the initialization action's documentation, which will prevent the CloudSQL proxy from being able to authorize its tunnel into your CloudSQL instance.
Additionally, aside from just the scopes, you need to make sure the default Compute Engine service account has the right project-level permissions in whichever project holds your CloudSQL instance. Normally the default service account is a project editor
in the GCE project, so that should be sufficient when combined with the sql-admin
scopes to access a CloudSQL instance in the same project, but if you're accessing a CloudSQL instance in a separate project, you'll also have to add that service account as a project editor in the project which owns the CloudSQL instance.
您可以在 IAM页面下找到默认计算服务帐户的电子邮件地址. 用于部署Dataproc集群的项目,名称为"Compute Engine默认服务帐户";它应该看起来像<number>
@ project.gserviceaccount.com`.
You can find the email address of your default compute service account under the IAM page for your project deploying Dataproc clusters, with the name "Compute Engine default service account"; it should look something like <number>
@project.gserviceaccount.com`.
这篇关于Google Cloud Dataproc无法使用初始化脚本创建新集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!