允许主管的孩子利用重新启动功能,但是一旦通过最大重新启动操作,就不杀死主管? [英] Allow a supervisor child take advantage of restarts, but not kill the supervisor once it passes max restarts?

查看:89
本文介绍了允许主管的孩子利用重新启动功能,但是一旦通过最大重新启动操作,就不杀死主管?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 simple_one_for_one 主管来管理一组相当不稳定的孩子-他们通常由于外部原因而死亡,例如他们的网络连接被终止。 Erlang的监督系统对此非常出色-只需重新启动它们即可备份,一切都会正常进行。

I have a simple_one_for_one supervisor that manages a fairly volatile set of children -- they often die due to external causes, e.g. their network conn being terminated. Erlang's supervision system is brilliant for this -- it just restarts them back up and everything rolls on.

问题发生在其中一个孩子的连接严重问题时并达到管理员的最大重新启动限制,此时管理员杀死了所有孩子,然后自杀。太好了,这在文档中已指定。但是,我的理想行为是让主管放弃重新启动该特定孩子并继续。

The problem occurs when one of the children has a serious problem with the connection and hits the supervisor's max restart limit, at which point the supervisor kills all children, and then kills itself. Awesome, this is specified in the documentation. However, my ideal behavior would be for the supervisor to give up restarting that particular child, and continue.

我知道我可以使用监视主管的单独过程来实现这一点,

I know I can implement this using separate processes which monitors the supervisor, but this seems like overkill.

感谢任何想法!

推荐答案

我没有尝试过,但我建议主管启动另一个主管(每个进程一个),该主管的重启策略为 simple_one_for_one ,而重启子规范为瞬态

I didn't try it but I suggest that the supervisor launches another supervisor (one per process) with the restart strategy simple_one_for_one, and the restart child spec transient.

然后,该主管使用重新启动策略 one_for_one 来启动流程本身和重新启动子规范 permanent ,以及maxrestarts和maxtime满足您的需求。

Then this supervisor launch the process itself with the restart strategy one_for_one and the restart child spec permanent, and the maxrestarts and the maxtime fitting your need.

其中有些奇怪您的问题,您说主管杀死了一个有问题的孩子达到maxrestart时启动的所有孩子,我以为simple_one_for_one策略让他们不堪重负kers自己死了。

There is something strange in your question, you say that the supervisor kills all the children that were started when it reach the maxrestart for one faulty child, I thought that the simple_one_for_one strategy left the workers die by themselves.

[edit]
由于我很想测试这个想法,因此我写了一小组模块测试一下。

[edit] As I was curious to test this idea, I wrote a small set of module to test it.

她是最高主管的代码:

-module (factory).

-behaviour(supervisor).

-export([start_link/0]).
-export([init/1, start_process/1]).


-define(CHILD(I, Arglist), {I, {I, start_link, [Arglist]}, temporary, 5000, supervisor, [I]}).

start_link() ->
    supervisor:start_link({local, ?MODULE}, ?MODULE, []).

init([]) ->
    {ok, { {simple_one_for_one, 0, 10}, [?CHILD(proc_sup, [])]} }.

start_process(Arglist)->
    supervisor:start_child(?MODULE, [Arglist]). 

然后是中间代码,负责在出现问题时重新启动几次进程:

Then the code of the intermediate one, in charge to restart a few time a process in case of problem:

-module (proc_sup).

-behaviour(supervisor).

-export([start_link/2]).
-export([init/1]).

-define(CHILD(Mod, Start, Arglist), {Mod, {Mod, Start, Arglist}, permanent, 5000, worker, [Mod]}).

start_link(_,Arglist) ->
    io:format("proc_sup arg = ~p~n",[Arglist]),
    supervisor:start_link(?MODULE, [Arglist]).

init([[Mod,Start|[Arglist]]]) ->
    {ok, { {one_for_one, 5, 10}, [?CHILD(Mod,Start,Arglist)]} }.

然后是一个可以停止的小模块的代码,接收消息,被编程为死亡一段时间后,为了测试该机制。

And then the code of a small modules that can be stopped, receive a message, be programmed to die after a certain time, in order to test the mechanism.

-module(dumb).
-export([start_link/1,loop/2]).

start_link(Arg) ->
    io:format("dumb start param = ~p~n",[Arg]),
    {ok,spawn_link(?MODULE,loop,[Arg,init])}.


loop({die,T},_) ->
    receive
    after T -> ok
    end;
loop(Arg,init) ->
    io:format("loop pid ~p with arg ~p~n",[self(),Arg]),
    loop(Arg,0);
loop(Arg,N) ->
    io:format("loop ~p (~p) cycle ~p~n",[Arg,self(),N]),
    receive
        stop -> 'restart_:o)';
        _ -> loop(Arg,N+1)
    end.

最后是shell执行的副本:

Finally a copy of the shell execution:

1> factory:start_link().
{ok,<0.37.0>}
2> 
2> factory:start_process([dumb,start_link,[loop_1]]).
proc_sup arg = [dumb,start_link,[loop_1]]
dumb start param = loop_1
loop pid <0.40.0> with arg loop_1
loop loop_1 (<0.40.0>) cycle 0
{ok,<0.39.0>}
3> 
3> factory:start_process([dumb,start_link,[loop_1]]).
proc_sup arg = [dumb,start_link,[loop_1]]
dumb start param = loop_1
loop pid <0.43.0> with arg loop_1
loop loop_1 (<0.43.0>) cycle 0
{ok,<0.42.0>}
4> 
4> factory:start_process([dumb,start_link,[loop_2]]).
proc_sup arg = [dumb,start_link,[loop_2]]
dumb start param = loop_2
loop pid <0.46.0> with arg loop_2
loop loop_2 (<0.46.0>) cycle 0
{ok,<0.45.0>}
5> 
5> pid(0, 2310, 0) ! hello.                          
hello
6> 
6> pid(0, 40, 0) ! hello.  
loop loop_1 (<0.40.0>) cycle 1
hello
7> pid(0, 40, 0) ! hello.
loop loop_1 (<0.40.0>) cycle 2
hello
8> pid(0, 40, 0) ! hello.
loop loop_1 (<0.40.0>) cycle 3
hello
9> pid(0, 43, 0) ! hello.
loop loop_1 (<0.43.0>) cycle 1
hello
10> pid(0, 43, 0) ! hello.
loop loop_1 (<0.43.0>) cycle 2
hello
11> pid(0, 40, 0) ! stop. 
dumb start param = loop_1
stop
loop pid <0.54.0> with arg loop_1
loop loop_1 (<0.54.0>) cycle 0
12> pid(0, 40, 0) ! stop.
stop
13> pid(0, 54, 0) ! stop.
dumb start param = loop_1
stop
loop pid <0.57.0> with arg loop_1
loop loop_1 (<0.57.0>) cycle 0
14> pid(0, 57, 0) ! hello.
loop loop_1 (<0.57.0>) cycle 1
hello
15> factory:start_process([dumb,start_link,[{die,5}]]).
proc_sup arg = [dumb,start_link,[{die,5}]]
dumb start param = {die,5}
{ok,<0.60.0>}
16> 
dumb start param = {die,5}
dumb start param = {die,5}
dumb start param = {die,5}
dumb start param = {die,5}
dumb start param = {die,5}
16> factory:start_process([dumb,start_link,[{die,50000}]]).
proc_sup arg = [dumb,start_link,[{die,50000}]]
dumb start param = {die,50000}
{ok,<0.68.0>}
17> 
dumb start param = {die,50000}
17>

这篇关于允许主管的孩子利用重新启动功能,但是一旦通过最大重新启动操作,就不杀死主管?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆