无法在Kubernetes Spark 2.4.0上启动SparkPi示例 [英] Cannot launch SparkPi example on Kubernetes Spark 2.4.0

查看:87
本文介绍了无法在Kubernetes Spark 2.4.0上启动SparkPi示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图在带有Spark 2.4.0的Kubernetes上简单地运行SparkPi示例,并且它似乎根本不像文档中那样运行.

I've been trying to simply run the SparkPi example on Kubernetes with Spark 2.4.0 and it doesn't seem to behave at all like in the documentation.

我遵循了指南.我使用 docker-image-tool.sh 脚本构建了一个普通的docker镜像.将其添加到我的注册表中.

I followed the guide. I built a vanilla docker image with the docker-image-tool.sh script. Added it to my registry.

我使用以下命令从我的spark文件夹启动该作业:

I launch the job from my spark folder with a command like this:

bin/spark-submit \
    --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.container.image=<spark-image> \
    --conf spark.kubernetes.namespace=mynamespace \
    --conf spark.kubernetes.container.image.pullSecrets=myPullSecret \
    local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar

除了 namespace pullSecrets 选项外,这与文档中的内容几乎相同.由于多用户kubernetes环境中的限制,我需要这些选项.即便如此,我尝试使用默认的名称空间也得到了相同的结果.

This is virtually the same as in the documentation except for the namespace and pullSecrets options. I need these options because of constraints in a multi user kubernetes environment. Even so I tried using the default namespace and I got the same outcome.

发生的情况是,吊舱卡在故障状态,并且发生了两种异常情况:

What happens is that the pod gets stuck in the failed state and two abnormal conditions occur:

  • 出现错误:卷"spark-conf-volume"的MountVolume.SetUp失败:找不到configmaps"spark-pi-1547643379283-driver-conf-map" .指示k8无法将配置映射安装到应包含属性文件的/opt/spark/conf.配置映射(具有相同的名称)存在,所以我不明白为什么k8s无法安装它.
  • 在容器日志中,启动命令中有几个必不可少的环境变量.
  • There's an error: MountVolume.SetUp failed for volume "spark-conf-volume" : configmaps "spark-pi-1547643379283-driver-conf-map" not found. Indicating that k8s could not mount the config map to /opt/spark/conf which should contain a properties file. The config map (with the same name) exists so I don't understand why k8s cannot mount it.
  • In the container logs there are several essential environment variables in the launch command that are empty.

容器日志:

CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS $SPARK_DRIVER_ARGS)
exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java -cp ':/opt/spark/jars/*' -Xms -Xmx -Dspark.driver.bindAddress=10.11.12.13

您可以使用诸如 spark.kubernetes.driverEnv.SPARK_DRIVER_CLASS 之类的属性直接控制其中一些变量,但这不是必需的,因为(在本示例中,该类已经用--class ).

You can control some of these variables directly with properties such as spark.kubernetes.driverEnv.SPARK_DRIVER_CLASS but this should not be necessary as (in this example the class is already specified with --class).

为清楚起见,以下环境变量为空:

For clarity the following environment variables are empty:

  • SPARK_DRIVER_MEMORY
  • SPARK_DRIVER_CLASS
  • SPARK_DRIVER_ARGS

SPARK_CLASSPATH 也缺少我在命令行上指定的本地容器jar(spark-examples_2.11-2.4.0.jar).

The SPARK_CLASSPATH is also missing the container-local jar I specified on the command line (spark-examples_2.11-2.4.0.jar).

似乎,即使我们解决了安装configmap的问题,它也不会帮助填充 SPARK_DRIVER_MEMORY ,因为它不包含等效的配置参数.

It seems that even if we resolve the problem with mounting the configmap it won't help populate SPARK_DRIVER_MEMORY because it does not contain an equivalent configuration parameter.

如何解决安装配置图的问题以及如何解决这些环境变量?

How do I resolve the problem of mounting the config map and how do I resolve these environment variables?

kubernetes yaml配置是由Spark创建的,但如果有帮助,我会在此处发布:

The kubernetes yaml configuration is created by Spark, but in case it help I am posting here:

pod-spec.yaml

pod-spec.yaml

    {
      "kind": "Pod",
      "apiVersion": "v1",
      "metadata": {
        "name": "spark-pi-1547644451461-driver",
        "namespace": "frank",
        "selfLink": "/api/v1/namespaces/frank/pods/spark-pi-1547644451461-driver",
        "uid": "90c9577c-1990-11e9-8237-00155df6cf35",
        "resourceVersion": "19241392",
        "creationTimestamp": "2019-01-16T13:13:50Z",
        "labels": {
          "spark-app-selector": "spark-6eafcf5825e94637974f39e5b8512028",
          "spark-role": "driver"
        }
      },
      "spec": {
        "volumes": [
          {
            "name": "spark-local-dir-1",
            "emptyDir": {}
          },
          {
            "name": "spark-conf-volume",
            "configMap": {
              "name": "spark-pi-1547644451461-driver-conf-map",
              "defaultMode": 420
            }
          },
          {
            "name": "default-token-rfz9m",
            "secret": {
              "secretName": "default-token-rfz9m",
              "defaultMode": 420
            }
          }
        ],
        "containers": [
          {
            "name": "spark-kubernetes-driver",
            "image": "my-repo:10001/spark:latest",
            "args": [
              "driver",
              "--properties-file",
              "/opt/spark/conf/spark.properties",
              "--class",
              "org.apache.spark.examples.SparkPi",
              "spark-internal"
            ],
            "ports": [
              {
                "name": "driver-rpc-port",
                "containerPort": 7078,
                "protocol": "TCP"
              },
              {
                "name": "blockmanager",
                "containerPort": 7079,
                "protocol": "TCP"
              },
              {
                "name": "spark-ui",
                "containerPort": 4040,
                "protocol": "TCP"
              }
            ],
            "env": [
              {
                "name": "SPARK_DRIVER_BIND_ADDRESS",
                "valueFrom": {
                  "fieldRef": {
                    "apiVersion": "v1",
                    "fieldPath": "status.podIP"
                  }
                }
              },
              {
                "name": "SPARK_LOCAL_DIRS",
                "value": "/var/data/spark-368106fd-09e1-46c5-a443-eec0b64b5cd9"
              },
              {
                "name": "SPARK_CONF_DIR",
                "value": "/opt/spark/conf"
              }
            ],
            "resources": {
              "limits": {
                "memory": "1408Mi"
              },
              "requests": {
                "cpu": "1",
                "memory": "1408Mi"
              }
            },
            "volumeMounts": [
              {
                "name": "spark-local-dir-1",
                "mountPath": "/var/data/spark-368106fd-09e1-46c5-a443-eec0b64b5cd9"
              },
              {
                "name": "spark-conf-volume",
                "mountPath": "/opt/spark/conf"
              },
              {
                "name": "default-token-rfz9m",
                "readOnly": true,
                "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
              }
            ],
            "terminationMessagePath": "/dev/termination-log",
            "terminationMessagePolicy": "File",
            "imagePullPolicy": "IfNotPresent"
          }
        ],
        "restartPolicy": "Never",
        "terminationGracePeriodSeconds": 30,
        "dnsPolicy": "ClusterFirst",
        "serviceAccountName": "default",
        "serviceAccount": "default",
        "nodeName": "kube-worker16",
        "securityContext": {},
        "imagePullSecrets": [
          {
            "name": "mypullsecret"
          }
        ],
        "schedulerName": "default-scheduler",
        "tolerations": [
          {
            "key": "node.kubernetes.io/not-ready",
            "operator": "Exists",
            "effect": "NoExecute",
            "tolerationSeconds": 300
          },
          {
            "key": "node.kubernetes.io/unreachable",
            "operator": "Exists",
            "effect": "NoExecute",
            "tolerationSeconds": 300
          }
        ]
      },
      "status": {
        "phase": "Failed",
        "conditions": [
          {
            "type": "Initialized",
            "status": "True",
            "lastProbeTime": null,
            "lastTransitionTime": "2019-01-16T13:15:11Z"
          },
          {
            "type": "Ready",
            "status": "False",
            "lastProbeTime": null,
            "lastTransitionTime": "2019-01-16T13:15:11Z",
            "reason": "ContainersNotReady",
            "message": "containers with unready status: [spark-kubernetes-driver]"
          },
          {
            "type": "ContainersReady",
            "status": "False",
            "lastProbeTime": null,
            "lastTransitionTime": null,
            "reason": "ContainersNotReady",
            "message": "containers with unready status: [spark-kubernetes-driver]"
          },
          {
            "type": "PodScheduled",
            "status": "True",
            "lastProbeTime": null,
            "lastTransitionTime": "2019-01-16T13:13:50Z"
          }
        ],
        "hostIP": "10.1.2.3",
        "podIP": "10.11.12.13",
        "startTime": "2019-01-16T13:15:11Z",
        "containerStatuses": [
          {
            "name": "spark-kubernetes-driver",
            "state": {
              "terminated": {
                "exitCode": 1,
                "reason": "Error",
                "startedAt": "2019-01-16T13:15:23Z",
                "finishedAt": "2019-01-16T13:15:23Z",
                "containerID": "docker://931908c3cfa6c2607c9d493c990b392f1e0a8efceff0835a16aa12afd03ec275"
              }
            },
            "lastState": {},
            "ready": false,
            "restartCount": 0,
            "image": "my-repo:10001/spark:latest",
            "imageID": "docker-pullable://my-repo:10001/spark@sha256:58e319143187d3a0df14ceb29a874b35756c4581265f0e1de475390a2d3e6ed7",
            "containerID": "docker://931908c3cfa6c2607c9d493c990b392f1e0a8efceff0835a16aa12afd03ec275"
          }
        ],
        "qosClass": "Burstable"
      }
    }

config-map.yml

config-map.yml

{
  "kind": "ConfigMap",
  "apiVersion": "v1",
  "metadata": {
    "name": "spark-pi-1547644451461-driver-conf-map",
    "namespace": "frank",
    "selfLink": "/api/v1/namespaces/frank/configmaps/spark-pi-1547644451461-driver-conf-map",
    "uid": "90eda9e3-1990-11e9-8237-00155df6cf35",
    "resourceVersion": "19241350",
    "creationTimestamp": "2019-01-16T13:13:50Z",
    "ownerReferences": [
      {
        "apiVersion": "v1",
        "kind": "Pod",
        "name": "spark-pi-1547644451461-driver",
        "uid": "90c9577c-1990-11e9-8237-00155df6cf35",
        "controller": true
      }
    ]
  },
  "data": {
    "spark.properties": "#Java properties built from Kubernetes config map with name: spark-pi-1547644451461-driver-conf-map\r\n#Wed Jan 16 13:14:12 GMT 2019\r\nspark.kubernetes.driver.pod.name=spark-pi-1547644451461-driver\r\nspark.driver.host=spark-pi-1547644451461-driver-svc.frank.svc\r\nspark.kubernetes.container.image=aow-repo\\:10001/spark\\:latest\r\nspark.kubernetes.container.image.pullSecrets=mypullsecret\r\nspark.executor.instances=5\r\nspark.app.id=spark-6eafcf5825e94637974f39e5b8512028\r\nspark.app.name=spark-pi\r\nspark.driver.port=7078\r\nspark.kubernetes.resource.type=java\r\nspark.master=k8s\\://https\\://10.1.2.2\\:6443\r\nspark.kubernetes.python.pyFiles=\r\nspark.kubernetes.executor.podNamePrefix=spark-pi-1547644451461\r\nspark.kubernetes.namespace=frank\r\nspark.driver.blockManager.port=7079\r\nspark.jars=/opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar\r\nspark.submit.deployMode=cluster\r\nspark.kubernetes.submitInDriver=true\r\n"
  }
}

推荐答案

Kubernetes上的火花有错误.

Spark on Kubernetes has a bug.

在将Spark作业提交到Kubernetes集群期间,我们首先创建Spark Driver Pod: https://github.com/apache/spark/blob/02c5b4f76337cc3901b8741887292bb4478931f3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L130 .

During Spark job submission to the Kubernetes cluster we first create Spark Driver Pod: https://github.com/apache/spark/blob/02c5b4f76337cc3901b8741887292bb4478931f3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L130 .

此后,我们才创建所有其他资源(例如:Spark驱动程序服务),包括ConfigMap:

Only after that we create all other resources (eg.: Spark Driver Service), including ConfigMap: https://github.com/apache/spark/blob/02c5b4f76337cc3901b8741887292bb4478931f3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L135 .

我们这样做是为了能够将Spark Driver Pod设置为所有这些资源的 ownerReference (在创建所有者Pod之前无法完成):

We do that so to be able to set Spark Driver Pod as the ownerReference to all of those resources (which cannot be done before we create the owner Pod): https://github.com/apache/spark/blob/02c5b4f76337cc3901b8741887292bb4478931f3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L134.

它有助于我们将所有这些资源的删除委派给Kubernetes责任,这对于更轻松地在集群中收集未使用的资源很有用.在这种情况下,我们需要清除的只是删除Spark Driver Pod.但是Kubernetes可能会在ConfigMap准备就绪之前实例化Spark Driver Pod的创建,这会引起您的问题.

It helps us to delegate the deletion of all those resources to the Kubernetes responsibility, which is useful for collecting the unused resources more easily in the cluster. All we need to cleanup in that case is just delete Spark Driver Pod. But there is a risk that Kubernetes will instantiate Spark Driver Pod creation before the ConfigMap is ready, which will cause your issue.

对于2.4.4仍然如此.

This is still true for 2.4.4.

这篇关于无法在Kubernetes Spark 2.4.0上启动SparkPi示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆