如何解决在Kubernetes中运行的Inception服务的部署问题 [英] How to troubleshoot deployment of Inception serving running in kubernetes

查看:123
本文介绍了如何解决在Kubernetes中运行的Inception服务的部署问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用通过TensorFlow Serving和Kubernetes的服务启动模型当我尝试从本地主机进行推理时,一切都可以通过k8s顺利完成到初始模型的最终服务.

I'm following the Serving Inception Model with TensorFlow Serving and Kubernetes workflow and everything work well up to the point of the final serving of the inception model via k8s when I am trying to do inference from a local host.

我正在运行Pod,并且$kubectl describe service初始服务的输出与

I'm getting the pods running and the output of $kubectl describe service inception-service is consistent with what is suggested by the workflow in the Serving Inception Model with TensorFlow Serving and Kubernetes.

但是,在运行推理时,任何事情都不起作用.这是跟踪:

However, when running inference things don't work. Here is the trace:

$bazel-bin/tensorflow_serving/example/inception_client --server=104.155.175.138:9000 --image=cat.jpg

Traceback (most recent call last):
File "/home/dimlyus/serving/bazel-
bin/tensorflow_serving/example/inception_client.runfi
les/tf_serving/tensorflow_serving/example/inception_client.py", line 56, in 
tf.app.run()

File "/home/dimlyus/serving/bazel-
bin/tensorflow_serving/example/inception_client.runfi
les/org_tensorflow/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))

File "/home/dimlyus/serving/bazel-
bin/tensorflow_serving/example/inception_client.runfi
les/tf_serving/tensorflow_serving/example/inception_client.py", line 51, in 
main
result = stub.Predict(request, 60.0) # 10 secs timeout

File "/usr/local/lib/python2.7/dist-
packages/grpc/beta/_client_adaptations.py", line 32
4, in call
self._request_serializer, self._response_deserializer)

File "/usr/local/lib/python2.7/dist-
packages/grpc/beta/_client_adaptations.py", line 21
0, in _blocking_unary_unary
raise _abortion_error(rpc_error_call)
grpc.framework.interfaces.face.face.AbortionError: 
AbortionError(code=StatusCode.UNAVAILABLE, details="Connect Failed")

我正在Google Cloud上运行所有程序.设置是通过GCE实例完成的,而k8s是在Google Container Engine内部运行的. k8的设置遵循上面链接的工作流程中的说明,并使用

I am running everything on Google Cloud. The setup is done from a GCE instance and the k8s is run inside of Google Container Engine. The setup of the k8s follows the instructions from the workflow linked above and uses the inception_k8s.yaml file.

服务设置如下:

apiVersion: v1
kind: Service
metadata:
  labels:
    run: inception-service
  name: inception-service
spec:
  ports:
  - port: 9000
    targetPort: 9000
  selector:
    run: inception-service
  type: LoadBalancer

任何有关如何解决此问题的建议将不胜感激!

Any advice on how to troubleshoot this would be greatly appreciated!

推荐答案

错误消息似乎表明您的客户端无法连接到服务器.没有一些其他信息,很难解决问题.如果您发布部署和服务配置并提供有关环境的某些信息(它是否在云上运行?哪个?您的安全规则是什么?负载均衡器?),我们也许可以提供更好的帮助.

The error message seems to indicate that your client cannot connect to the server. Without some additional information it is hard to trouble shoot. If you post your deployment and service configuration as well as give some information about the environement (is it running on a cloud? which one? what are your security rules? load balancers?) we may be able to help better.

但是这里有些事情您可以立即检查:

But here some things that you can check right away:

  1. 如果您正在某种类型的云环境(Amazon,Google,Azure等)中运行,它们都具有安全规则,您需要在这些规则中明确打开运行kubernetes集群的节点上的端口.因此,应该在Controller和Worker节点上打开Tensorflow部署/服务正在使用的每个端口.

  1. If you are running in some kind of cloud environment (Amazon, Google, Azure, etc.), they all have security rules where you need to explicitly open the ports on the nodes running your kubernetes cluster. So every port that your Tensorflow deployment/service is using should be opened on the Controller and Worker nodes.

您仅为应用程序部署了Deployment还是部署了Service?如果运行Service,它将如何显示?您是否忘记启用NodePort?

Did you deploy only a Deployment for the app or also a Service? If you run a Service how does it expose? Did you forget to enable a NodePort?

更新:您的服务类型是负载均衡器.因此,应该在GCE中创建一个单独的负载平衡器.您需要获取负载均衡器的IP并通过负载均衡器的ip访问服务.请参阅此链接中的查找IP"部分 https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/

Update: Your service type is load balancer. So there should be a separate load balancer be created in GCE. you need to get the IP of the load balancer and access the service through the load balancer's ip. Please see the section 'Finding Your IP' in this link https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/

这篇关于如何解决在Kubernetes中运行的Inception服务的部署问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆