未记录的托管VM任务队列RPCFailedError [英] Undocumented Managed VM task queue RPCFailedError

查看:104
本文介绍了未记录的托管VM任务队列RPCFailedError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了GAE托管虚拟机和任务队列的一个非常特殊和未记录的问题。我知道托管虚拟机服务处于测试阶段,所以这个问题可能永远都不相关,但现在肯定让我感到头痛。



主要症状是问题是,在某些情况下(并不完全知道)情况下,我看到以下错误/回溯:

 文件/home/vmagent/my_app/some_file.py,第265行,在some_ndb_tasklet 
res = yield some_task.add_async('some-task-queue-name')
文件/ home / vmagent / python_vm_runtime / google / appengine / ext / ndb / tasklets.py,第472行,在_on_rpc_completion中
result = rpc.get_result()
文件/ home / vmagent / python_vm_runtime / google / appengine / api / apiproxy_stub_map.py,第613行,在get_result中
返回self .__ get_result_hook(self)
文件/home/vmagent/python_vm_runtime/google/appengine/api/taskqueue/taskqueue.py,第1948行,在ResultHook
rpc.check_success()
文件/ home / vmagent / python_vm_runtime / google / appengine / api / apiproxy_stub_map.py,第579行,check_success
self .__ rpc.CheckSuccess()
文件/home/vmagent/python_vm_runtime/google/appengine/ext/vmruntime/vmstub.py,第312行,在_WaitImpl
中引发self._ErrorException(* _ DEFAULT_EXCEPTION)
RPCFailedError:远程RPC到应用程序服务器失败,无法调用taskqueue.BulkAdd()。

我已经通过我的本地App Engine SDK追踪了这一点,我可以跟踪的最后一行,但我的机器上根本不存在 google / appengine / ext / vmruntime / ,所以我不知道<$ c中发生了什么$ C> vmstub.py 。从查看本地代码, some_task.add_async('the-queue')正在启动一个RPC并等待它完成,但是这个错误不是除了apiproxy_errors.ApplicationError,e:在taskqueue.py的第1949行期待...



正在生成的代码这个错误看起来像这样:

  @ ndb.tasklet 
def kickoff_tasks(batch_of_payloads):
for在batch_of_payloads中的task_payload:
#task_payload是一个字典
task = taskqueue.Task(
url ='/ the / handler / url',
params = payload)
res = yield task.add_async('some-valid-task-queue-name')

其他东西值得注意的是:


  • 这段代码本身运行在另一个任务启动的任务处理程序中。

  • 在执行这种批处理之前,我首先看到了这个错误,并且认为这个问题是因为我添加了o任务处理程序中的许多任务。
  • 在某些情况下,我可以成功运行这个批处理,大小为100,但在另一些情况下,它会一直失败(取决于有效负载中的数据)为100,有时会成功批量大小为50个。

  • 任务有效载荷本身包括批次项目,并且被调整为足够小以适应任务。 App Engine公布的最大任务大小为100KB,因此我现在将有效负载保持在90,000字节以内。降低尺寸似乎并没有什么帮助。

  • 我也尝试实施指数退避以重试 kickoff_tasks 方法,当这个错误出现时,但似乎一旦提出错误,我就不能在同一个处理程序中添加任何其他任务(即,我不能开始继续离开你的位置任务,I只需让这个失败并重新启动它自己)



所以,我的问题是,究竟是什么导致了这个错误?我怎样才能避免它,或解决这个问题,以便我正确地处理它?<​​b>

解决方案

这是一个已知问题,即正在努力。实际上有两个问题 - RPC失败本身以及缺乏由SDK处理RPCFailedError异常。

关于这个问题的一些公开讨论 here

I'm running into a very peculiar and undocumented issue with a GAE Managed VM and Task Queues. I understand that the Managed VM service is in beta, so this question may not be relevant forever, but it's definitely causing me lots of headache now.

The main symptom of the issue is that, in certain (not completely known to me) circumstances, I'm seeing the following error/traceback:

  File "/home/vmagent/my_app/some_file.py", line 265, in some_ndb_tasklet
    res = yield some_task.add_async('some-task-queue-name')
  File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 472, in _on_rpc_completion
    result = rpc.get_result()
  File "/home/vmagent/python_vm_runtime/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
    return self.__get_result_hook(self)
  File "/home/vmagent/python_vm_runtime/google/appengine/api/taskqueue/taskqueue.py", line 1948, in ResultHook
    rpc.check_success()
  File "/home/vmagent/python_vm_runtime/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
    self.__rpc.CheckSuccess()
  File "/home/vmagent/python_vm_runtime/google/appengine/ext/vmruntime/vmstub.py", line 312, in _WaitImpl
    raise self._ErrorException(*_DEFAULT_EXCEPTION)
RPCFailedError: The remote RPC to the application server failed for call taskqueue.BulkAdd().

I've gone through my local App Engine SDK to trace this through, and I can get up to the last line of the trace, but google/appengine/ext/vmruntime/ doesn't exist on my machine at all, so I have no idea what's happening in vmstub.py. From looking at the local code, some_task.add_async('the-queue') is spinning up an RPC and waiting for it to finish, but this error is not what the except apiproxy_errors.ApplicationError, e: at line 1949 of taskqueue.py is expecting...

The code that's generating the error looks something like this:

@ndb.tasklet
def kickoff_tasks(batch_of_payloads):
    for task_payload in batch_of_payloads:
        # task_payload is a dict
        task = taskqueue.Task(
            url='/the/handler/url',
            params=payload)
        res = yield task.add_async('some-valid-task-queue-name')

Other things worth noting:

  • this code itself is running in a task handler kicked off by another task.
  • I first saw this error before implementing this sort of batching, and assumed the issue was because I had added too many tasks from within a task handler.
  • In some cases, I can run this successfully with a batch size of 100, but in others, it fails consistently (depending on the data in the payloads) at 100, and sometimes succeeds at batch sizes of 50.
  • The task payloads themselves include batches of items, and are tuned to be just small enough to fit in a task. App Engine advertises a maximum task size of 100KB, so I'm keeping the payloads to under 90,000 bytes right now. Lowering the size even more doesn't seem to help any.
  • I've also tried implementing an exponential backoff to retry the kickoff_tasks method when this error appears, but it seems that once the error is raised, I can't add any other tasks at all from within the same handler (i.e. I can't kickoff a "continue from where you left off" task, I just have to let this one fail and restart itself)

So, my question is, what is actually causing this error? How can I avoid it, or fix this so that I'm handling it correctly?

解决方案

This is a known issue that is being worked on. There are actually two issues - the RPC failure itself and the lack of handling of the RPCFailedError exception by the SDK.

There is some public discussion of the issue here.

这篇关于未记录的托管VM任务队列RPCFailedError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆