Ansible Runner 连续调用在完成太快时会混乱 [英] Ansible Runner consecutive calls mess up when done too fast

查看:56
本文介绍了Ansible Runner 连续调用在完成太快时会混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用官方 Ansible Runner 库制作了一款软件,该软件可接收多个远程调用以运行 1 次或 N 次 1 次或 M 次剧本...Ansible 运行配置是连续的,尽管这不应该与不同的调用相关(如果我理解正确,它只是在同一剧本运行中配置任务)

I have made a piece of software using the official Ansible Runner libraries that receives several remote calls to run 1 or N times 1 or M playbooks... The Ansible run config is sequential, although this should not be relevant for different calls (if I understand right, it just configures the tasks inside the same playbook run)

因此,我使用 Ansible Runnerrun_async() 运行剧本:

So, I run the playbooks using Ansible Runner's run_async():

runner_async_thread, runner_object = ansible_runner.run_async(
                **{k: v for k, v in kwargs.items() if v is not None})

并保持异步线程的is_alive()方法的循环,检查其他条件

and keep a loop the asynchronous thread's is_alive() method, checking for other conditions

while runner_async_thread.is_alive():
    ...

如果出现异常,或者线程结束后,我只检查状态结果并返回.

If an exception is raised, or after the thread finishes, I just check the status result and return.

问题在于,当系统同时接收到大量调用时,它会变得一团糟,并且我会收到这样的错误:

The issue is that, when the system receives a lot of calls together, it messes up, and I get errors such as this one:

The offending line appears to be:


{"username": "operator", "password": "!", "target": "sever_003_linux"}05_linux"}
                                                                      ^ here
We could be wrong, but this one looks like it might be an issue with
unbalanced quotes. If starting a value with a quote, make sure the
line ends with the same set of quotes. For instance this arbitrary
example:

    foo: "bad" "wolf"

Could be written as:

    foo: '"bad" "wolf"'

错误显然是这样的:

    {"username": "new_user", "target": "sever_003_linux"}05_linux"}

我反复检查(日志和 env/extravars 文件),但发送的命令是正确的:

I doble check (logs and env/extravars files), but the sent commands are right:

{"username": "new_user", "target": "sever_003_linux"}

因此,似乎内存区域在没有清理的情况下被覆盖,是否可以在没有线程安全的情况下让 2 个跑步者一起运行(似乎是可能的)?请问您对如何解决这个问题或防止它发生的方法有什么想法吗?

So, it seems a memory area is being overwritten without been cleaned, could be 2 runners running together (it seems it is possible) without Thread Safety? Do you have some idea about how to fix this or a way to prevent it from happening, please?

代码正常工作,使用一些延迟时相同的调用工作,但我认为这不是理想的解决方案......

The code normally worked, the same calls worked when using some delays, but I don't think it is an ideal solution...

我在玩 Ansible 配置,但没办法.

I was playing with Ansible config, but no way.

ansible 2.9.6
python version = 3.8.10 (default, Jun  2 2021, 10:49:15) [GCC 9.4.0]

推荐答案

我发现更多人在这个 Jira 故事中报告了这个问题:https://jira.opencord.org/browse/CORD-922

I found more people reporting about this issue in this Jira story: https://jira.opencord.org/browse/CORD-922

Ansible 在通过其 API 使用时不是线程安全的.

Ansible, when used via its API, is not thread-safe.

他们还提出了如何克服这个问题的想法:

They also propose an idea about how to overcome this problem:

为了安全和避免此类问题,我们将通过在使用前调用 fork() 将 Ansible 的调用包装在一个进程中.

To be safe and avoid such issues, we will wrap invocations of Ansible in a process by invoking a fork() before using it.

但是,就我而言,我必须返回操作的结果才能报告它.因此,我声明了一个共享队列以进行进程通信,并且我将主要队列分叉.

But, in my case, I have to return the result of the operation to report it. Therefore, I declare a shared queue in order to communicate the processes, and I fork the main one.

import ansible_runner
from multiprocessing import Queue
import os

#...

def playbook_run(self, parameters):
    #...
    runner_async_thread, runner_object = ansible_runner.run_async(
                    **{k: v for k, v in kwargs.items() if v is not None})
    while runner_async_thread.is_alive():
        #...
    return run_result


shared_queue = Queue()
process_pid = os.fork()
if process_pid == 0:  # the forked child process will independently run & report
    run_result = self.playbook_run(playbook_name,
                                   parameters)
    shared_queue.put(run_result)
    shared_queue.close()
    shared_queue.join_thread()
    os._exit(0)
else:  # the parent process will wait until it gets the report
    run_result = shared_queue.get()
    return run_result

而且,假设缺乏线程安全是问题所在,问题就解决了.

And, assuming that the lack of thread safety was the issue, problem solved.

由于我认为没有报道,我在 Ansible Runner 开发者 GitHub 中打开了一个问题:https://github.com/ansible/ansible-runner/issues/808

As I think it was not reported, I opened an issue in the Ansible Runner developers GitHub: https://github.com/ansible/ansible-runner/issues/808

这篇关于Ansible Runner 连续调用在完成太快时会混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆