Kubernetes WatchConnectionManager:执行失败:HTTP 403 [英] Kubernetes WatchConnectionManager: Exec Failure: HTTP 403

查看:953
本文介绍了Kubernetes WatchConnectionManager:执行失败:HTTP 403的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到错误Expected HTTP 101 response but was '403 Forbidden'当我使用Kubeadm设置一个新的Kubernetes集群并拥有一个主服务器和两个工作程序后,提交了pyspark示例应用程序,该消息在ERROR消息下方遇到:

I'm experiencing Error Expected HTTP 101 response but was '403 Forbidden' After I setup a new Kubernetes cluster using Kubeadm with a single master and two workers, as I submit a pyspark sample app I encountered below ERROR message:

火花提交命令

spark-submit --master k8s://master-host:port \
--deploy-mode cluster --name test-pyspark \
--conf spark.kubernetes.container.image=mm45/pyspark-k8s-example:2.4.1 \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.executor.instances=1 \
--conf spark.executor.memory=1000m \
--conf spark.driver.memory=1000m \
--conf spark.executor.cores=1 \
--conf spark.driver.cores=1 \
--conf spark.driver.maxResultSize=10g /usr/bin/run.py

错误详细信息:

19/08/24 19:38:06 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'

集群详细信息:

  1. Kubernetes版本1.13.0
  2. Spark版本2.4.1
  3. 云平台:在EC2上运行的AWS

集群角色绑定:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: fabric8-rbac
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

完整的Pod日志和错误堆栈跟踪

++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=driver-py
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ sort -t_ -k4 -n
+ grep SPARK_JAVA_OPT_
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' 3 == 2 ']'
+ '[' 3 == 3 ']'
++ python3 -V
+ pyv3='Python 3.6.8'
+ export PYTHON_VERSION=3.6.8
+ PYTHON_VERSION=3.6.8
+ export PYSPARK_PYTHON=python3
+ PYSPARK_PYTHON=python3
+ export PYSPARK_DRIVER_PYTHON=python3
+ PYSPARK_DRIVER_PYTHON=python3
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@" $PYSPARK_PRIMARY $PYSPARK_ARGS)
+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.32.0.3 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner file:/usr/bin/run.py
19/08/24 19:38:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/08/24 19:38:04 INFO SparkContext: Running Spark version 2.4.1
19/08/24 19:38:04 INFO SparkContext: Submitted application: calculate_pyspark_example
19/08/24 19:38:04 INFO SecurityManager: Changing view acls to: root
19/08/24 19:38:04 INFO SecurityManager: Changing modify acls to: root
19/08/24 19:38:04 INFO SecurityManager: Changing view acls groups to:
19/08/24 19:38:04 INFO SecurityManager: Changing modify acls groups to:
19/08/24 19:38:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
19/08/24 19:38:04 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
19/08/24 19:38:04 INFO SparkEnv: Registering MapOutputTracker
19/08/24 19:38:04 INFO SparkEnv: Registering BlockManagerMaster
19/08/24 19:38:04 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/08/24 19:38:04 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/08/24 19:38:04 INFO DiskBlockManager: Created local directory at /var/data/spark-e431c2ef-42ea-4de9-904e-72ab83c70cdf/blockmgr-718b703d-3587-44a6-8014-02162ae3a48c
19/08/24 19:38:04 INFO MemoryStore: MemoryStore started with capacity 400.0 MB
19/08/24 19:38:04 INFO SparkEnv: Registering OutputCommitCoordinator
19/08/24 19:38:04 INFO Utils: Successfully started service 'SparkUI' on port 4040.
19/08/24 19:38:04 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://test-pyspark-1566675457745-driver-svc.default.svc:4040
19/08/24 19:38:04 INFO SparkContext: Added file file:///usr/bin/run.py at spark://test-pyspark-1566675457745-driver-svc.default.svc:7078/files/run.py with timestamp 1566675484977
19/08/24 19:38:04 INFO Utils: Copying /usr/bin/run.py to /var/data/spark-e431c2ef-42ea-4de9-904e-72ab83c70cdf/spark-0ee3145c-e088-494f-8da1-5b8f075d3bc8/userFiles-5cfd25bf-4775-404d-86c8-5a392deb1e18/run.py
19/08/24 19:38:06 INFO ExecutorPodsAllocator: Going to request 2 executors from Kubernetes.
19/08/24 19:38:06 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
    at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
19/08/24 19:38:06 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
19/08/24 19:38:06 ERROR SparkContext: Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException:
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
    at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
19/08/24 19:38:06 INFO SparkUI: Stopped Spark web UI at http://test-pyspark-1566675457745-driver-svc.default.svc:4040
19/08/24 19:38:06 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
19/08/24 19:38:06 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
19/08/24 19:38:06 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/08/24 19:38:06 INFO MemoryStore: MemoryStore cleared
19/08/24 19:38:06 INFO BlockManager: BlockManager stopped
19/08/24 19:38:06 INFO BlockManagerMaster: BlockManagerMaster stopped
19/08/24 19:38:06 WARN MetricsSystem: Stopping a MetricsSystem that is not running
19/08/24 19:38:06 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/08/24 19:38:06 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
  File "/usr/bin/run.py", line 8, in <module>
    with SparkContext(conf=conf) as sc:
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 136, in __init__
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 198, in _do_init
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 306, in _initialize_context
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: io.fabric8.kubernetes.client.KubernetesClientException:
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
    at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

19/08/24 19:38:06 INFO ShutdownHookManager: Shutdown hook called
19/08/24 19:38:06 INFO ShutdownHookManager: Deleting directory /tmp/spark-178fc09d-353c-4906-8899-bdac338c804a
19/08/24 19:38:06 INFO ShutdownHookManager: Deleting directory /var/data/spark-e431c2ef-42ea-4de9-904e-72ab83c70cdf/spark-0ee3145c-e088-494f-8da1-5b8f075d3bc8

您能帮我弄清楚这一点吗?

Can you please help me making sense of this?

推荐答案

在Kubernetes v1.15.3,Kubernetes v1.14.6,Kubernetes v1.13.10版本中会发生这种情况,项目Spark运算符有一种解决方法,即添加kubernetes-客户端最新版本(kubernetes-client-4.4.2.jar),并且您需要删除映像中的实际版本,可以在Dockerfile中添加下一行

This is happened in the Kubernetes v1.15.3, Kubernetes v1.14.6, Kubernetes v1.13.10 version, the project spark operator have a workaround adding adding the kubernetes-client last version (kubernetes-client-4.4.2.jar) and you need to delete the actually version in your image, you can add the next lines in your Dockerfile

RUN rm $SPARK_HOME/jars/kubernetes-client-3.0.0.jar

ADD https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.4.2/kubernetes-client-4.4.2.jar $SPARK_HOME/jars

如果在应用此修复程序后得到Invocation error,则可能还希望将kubernetes-model-*.jar也升级到4.4.2.

If you ever get an Invocation error after applying this fix, you might want to upgrade the kubernetes-model-*.jar to 4.4.2 as well.

但是,如果您不希望/不希望将k8s-client从3.0.0升级到4.4.2,因为这是一个很大的跃点,并且可能会导致遗留问题,那么请进行更深入的介绍(以及更多内容)技术性)解决方案和发生了什么情况的解释(ref:

But if you can't/don't want to upgrade your k8s-client from 3.0.0 to 4.4.2 since it's quite a huge hop and could result to legacy issues, here's a more in-depth (and more technical) solution and explanation as to what happened (ref: #SPARK-28921)

使用的Kubernetes URL未指定端口时 (例如, https://example.com/api/v1/.. .),的原始标头 监视请求的端口号为-1(例如 https://example.com:-1 ). 发生这种情况是因为在java.net.URL对象上调用getPort() 没有明确指定的端口将始终返回-1.这 返回值始终只是插入到原始标头中.

When the Kubernetes URL used doesn't specify a port (e.g., https://example.com/api/v1/...), the origin header for watch requests ends up with a port of -1 (e.g. https://example.com:-1). This happens because calling getPort() on a java.net.URL object that does not have a port explicitly specified will always return -1. The return value was always just inserted into the origin header.

https://github.com/fabric8io/kubernetes-client/pull/1669

您可以在此处,直到kubernetes-client-4.4.x才应用此修复程序.我所做的是修补当前的.jar并构建自定义的.jar:

As you can see here, the fix wasn't applied until kubernetes-client-4.4.x. What I did was patch the current .jar and build a customized .jar:

  1. 存储库 /下载Kubernetes-client源代码
  2. 在此提交中应用更改.
  3. >
  4. 使用maven插件构建.jar文件.
  5. 用自定义的.jar替换/opt/spark/jars/kubernetes-client-3.0.0.jar.
  1. Clone / download Kubernetes-client source code from repo
  2. Apply the changes in this commit.
  3. Build the .jar file using maven plugin.
  4. Replace the /opt/spark/jars/kubernetes-client-3.0.0.jar with the customized .jar.

这篇关于Kubernetes WatchConnectionManager:执行失败:HTTP 403的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆