如何跟踪埃尔朗儿童进程? [英] How to keep track of children processes in erlang?

查看:138
本文介绍了如何跟踪埃尔朗儿童进程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个静态列表的主机与他们的信息,和一个动态列表主机代理。只要通过TCP连接连接到服务器,每个主机只有一个代理。由于主机可能连接或可能不连接,其代理进程可以启动也可能不会启动。当一个TCP数据包到达主机ID时,我需要知道该主机的代理是否启动。



连接负责从tcp套接字接收和发送数据,解析数据,找出应发送到哪个主机并传送给其主机代理。 p>

主机保留主机信息。主机代理处理传入数据,将主机信息保存到主机,并以什么格式(例如,以主机ID和响应代码为客户端)确定要发送的内容。



在数据包中,它指定了源主机和目标主机,这意味着它由源主机发送,应由目标主机接收。在这种情况下,目标主机可以连接在另一个连接中。这就是为什么需要所有连接的全局映射,以方便获取目标主机代理pid。



我有一个监督树,其中 host_supervisor 监视所有主机 connection_supervisor code>连接, host_agent_supervisor 监视代理 host_supervisor connection_supervisor 都由应用程序主管监督,这意味着他们是监督树中的第一级孩子。但是 host_agent_supervisor connection_supervisor 下。



问题: / p>


  1. 使用host_id和
    host_agent_pid对将地图存储到db中是个好主意?

  2. 如果1.是真的,如何更新host_agent_pid
    当出现问题和代理重新启动?

  3. 有没有更好的实施这种情况的想法?看来我的解决方案不符合erlang方式。


解决方案

对您的问题的快速回答是:


  1. 没错,除了地图,你还可以使用gb_trees,dict或ETS表(当然是地图是最不成熟的)。然而,尽管如此,PID查找表的密钥/ ID是正确的。 ETS可能允许性能优于其他方面,因为您可以创建可以从其他进程访问的ETS表,从而无需单个进程来执行所有的阅读和写入操作。这可能是或可能不是重要和/或适当的。


  2. 这样做的一个简单方法是每次主机代理启动时,都会产生另一个进程,它除了链接到主机代理之外,除了主机ID到主机代理死亡之后的任何存储区域中,都将代理PID映射删除。另一种做法是使映射存储过程本身链接到您的主机代理PID,这可能会减少对可能的竞争条件的担忧。


  3. 可能。当我阅读你的问题时,我留下了一些问题和一般的感觉,我会选择的解决方案不会导致我正确的查询问题(即查询主机代理的PID在收到一个TCP数据包),但我不能确定这不是因为你已经努力减少你的堆栈溢出的问题。对我来说,你的主机,host_agent和连接过程的角色,责任和交互真的是真的有点不清楚,如果它们都应该存在和/或具有单独的监督树。

    / li>

所以,看看可能的选择...当你说TCP包到达时我认为你的意思是当外国主机连接到一个监听套接字,或者在已经接受的现有套接字上发送一些数据,并且主机ID是主机名(和/或端口),或是外部主机在连接后发送给您的其他任意ID。



无论哪种方式...通常在这种情况下,我会期望一个新的过程(在你的情况下,它的声音中的主机代理)将被产生处理新建立的TCP连接(通过动态(例如简单的一对一)主管),获取作为该连接的服务器端终端的套接字的所有权;在适当的情况下读取和写入套接字,并在连接关闭时终止。



对于该模型,如果有一个连接,您的主机代理应始终启动,如果没有连接,始终不启动,并且任何传入的TCP数据包将自动结束在正确代理的手中,因为它将被传递到代理正在处理的套接字,或者如果它是新连接,则代理将被启动。



从来没有出现在收到TCP数据包时查询代理的PID。



如果您需要为其他原因查找代理的PID,因为您的服务器有时需要主动将数据发送到可能连接的主机,否则您必须获取所有受监管的列表主机代理,并选择正确的一个(为此,您将使用主管:which_children / 1,根据Hamidreza的答案)或者您将维护主机ID到PID的地图,使用地图,gb_trees,dict,ets等。哪个是正确的取决于你可以拥有多少主机 - 如果超过一个,那么你应该可以保持一个地图,使查找时间不会变得太大的。



最后的评论,你可能会考虑看看 gproc 如果你还没有,万一你考虑它用于你的情况。



编辑/添加(以下问题编辑):



您的连接过程对我来说是多余的;如上所述,如果给套接字给主机代理,则连接的大部分责任都没有了。主机代理无法解析其接收到的数据,因为我可以看到有另一个进程解析它没有任何价值,只是将其传递给另一个进程。解析本身可能是一个确定性的功能,所以有一个单独的模块是明智的,但我看到没有一点单独的过程。



我没有看到你的主持人过程的一个点,你说主持人保持主机信息,这听起来像是一个持有主机名或主机ID的进程,这样的东西?



您还说它指定了源主机和目标主机,这意味着它由源主机发送并且应该由目标主机接收,这开始使这个声音有点像聊天服务器,或至少某种<一个href =https://en.wikipedia.org/wiki/Star_network =nofollow> hub spoke / star network 风格的通讯协议。我不明白为什么你无法通过创建一个这样的管理员树来做你想要的一切:

  top_sup 
|
.------------------------------。
| | |
map_server svc_listener hosts_sup(简单一对一)
|
.----------------------------->
| | | | | |

这里map_server只是将主机ID的映射保存到PID的主机 svc_listener 具有侦听套接字,只接受连接并询问 hosts_sup 在新客户端连接时生成新的主机主机进程( hosts_sup )对接受的套接字负责,并在启动时,使用 map_server 注册主机ID及其PID。



如果 map_server 链接到主机 PIDs,它可以在 host die,它可以为任何进程提供一个合适的API,以通过主机ID查找主机 PID。


I have a static list of "hosts" with their info, and a dynamic list of "host agents". Each host has one and only one agent for as long as it connects to the server by a TCP connection. As the host may or may not be connected, its agent process may or may not be started. When a TCP packet arrives with the host ID, I need to find out if the "agent" of this host is started or not.

Connection is responsible for receive and send data from tcp socket, parse the data to find out which host it should send to and deliver to it's host agent to handle.

Host kept host informations. Host agent handle incoming data, save host information to host and decide what to send in what format(e.g. ack to client with host id and response code).

And in the data packet, it specified source host and target host, which means it sent by source host and should received by target host. In this case target host could be connected in another connection. That's why a need a global map for all connections for the convenience of get the target host agent pid.

I have a supervision tree in which host_supervisor monitors all the host, and connection_supervisor monitors each connection, host_agent_supervisor monitors agent. host_supervisor, connection_supervisor are all supervised by application supervisor which means they are first level children in supervision tree. But host_agent_supervisor is under connection_supervisor.

Questions:

  1. Is it a good idea to store a map into db with host_id and host_agent_pid pair?
  2. If 1. is true, how to update the host_agent_pid when something wrong and agent is been restarted?
  3. Is there any better idea to implement this case? It seems my solution does not follow "the erlang way".

解决方案

The simple, or quick answer to your question(s) are:

  1. It's fine, though besides a map you could also use gb_trees, dict or an ETS table (maps is the least mature of all these of course). However, that notwithstanding, a key/ID to PID lookup table is fine, in principal. ETS might allow a performance benefit over the others because you can create an ETS table that can be accessed from other processes, eliminating the necessity for a single process to do all the reading and writing. That might or might not be important and/or appropriate.

  2. One simple way to do this is every time a "host agent" starts, it spawns another process, which does nothing but link to the "host agent" and remove the host ID to agent PID mapping from whatever store you have when the "host agent" dies. Another way to do it is cause a mapping store process itself to link to your host agent PIDs, which might give you less concern for possible race conditions.

  3. Possibly. When I read your question I was left with certain questions and a general feeling that the solution I would choose wouldn't lead me to the precise lookup issue you are asking about (i.e. lookup of the PID of a "host agent" upon receipt of a TCP packet), but I can't be sure this isn't because you've worked to minimise your question for Stack Overflow. It's a little unclear to me exactly what the roles, responsibilities and interactions of your "host", "host_agent" and "connection" processes really are, and if they should all exist and/or have separate supervision trees.

So, looking at possible alternatives... When you say "when a TCP packet arrives" I assume you mean when a foreign host connects to a listening socket or sends some data on an existing socket already accepted, and that the host ID is either the hostname (and or port) or it is some other arbitrary ID that the foreign host sends to you after connecting.

Either way... Generally in this sort of scenario, I'd expect that a new process (the "host agent" by the sounds of it in your case) would be spawned to handle the newly established TCP connection (via a dynamic (e.g. simple one to one) supervisor), taking ownership of the socket that is the server side end point of that connection; reading and writing the socket as appropriate, and terminating when the connection is closed.

With that model your "host agent" should always be started if there is a connection already and always be NOT started if there is not a connection, and any incoming TCP packet will end up automatically in the hands of the correct agent, because it will be delivered to the socket that the agent is handling, or if it's a new connection, the agent will be started.

The need to lookup the PID of an agent upon receipt of a TCP packet now never arises.

If you need to lookup the PID of an agent for other reasons though, because say your server sometimes needs to pro actively send data to a possibly connected "host", then you either have to get a list of all the supervised "host agents" and pick out the right one (for this you would use supervisor:which_children/1, as per Hamidreza's answer) OR you would maintain a map of host IDs to PIDs, using map, gb_trees, dict, ets, etc. Which is correct depends on how many "hosts" you could have - if it's more than a handful then you should proabably maintain a map of some sort so that the lookup time doesn't become too big.

Final comment, you might consider looking at gproc if you haven't already, in case you consider it of use for your case. It does this sort of thing.

Edit/addition (following question edit):

Your connection process sounds redundant to me; as suggested above, if you give the socket to the host agent then most of the responsibility of the connection is gone. There's no reason the host agent can't parse the data it receives, as far as I can see there's no value in having another process to parse it, just to then pass it to another process. The parsing itself is probably a deterministic function so it is sensible to have a separate module for it, but I see no point in a separate process.

I don't see the point of your 'host' process, you say "Host kept host informations" which makes it sound like it's just a process that holds a hostname or host ID, something like that?

You also say "it specified source host and target host, which means it sent by source host and should received by target host" which is beginning to make this sound a bit like a chat server, or at least some sort of hub spoke / star network style communication protocol. I can't see why you wouldn't be able to do everything you want by creating a supervisor tree like this:

        top_sup
           |
     .------------------------------.
     |             |                |
map_server    svc_listener      hosts_sup (simple one to one)
                                    |
                        .----------------------------->
                        |    |    |    |   |    |

Here the 'map_server' just maintains a map of host IDs to PIDs of hosts, the svc_listener has the listening socket, and just accepts connections and asks hosts_sup to spawn a new host when a new client connects, and the host processes (under hosts_sup) take responsibility for the accepted socket, and register the host ID and their PID with map_server when they start.

If map_server links to the host PIDs it can automatically clean up when a host dies, and it can provide a suitable API for any process to look up a host PID by host ID.

这篇关于如何跟踪埃尔朗儿童进程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆