活动期间,Erlang热代码交换如何工作? [英] How does Erlang hot code swapping work in the middle of activity?

查看:90
本文介绍了活动期间,Erlang热代码交换如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用实时媒体服务器,该服务器将使普通消费者可以向我们发送实时视频。在目前的环境中,我们可以看到几天之内发送给我们的广播,因此在不中断用户的情况下能够修复错误(或添加功能)的想法非常引人注目。

I am currently working on a live media server, which will allow general consumers to send live video to us. In our current environment we've seen broadcasts sent to us with the duration of days, so the idea of being able to fix a bug (or add a feature) without disconnecting users is extremely compelling.

但是在编写代码时,我意识到热代码交换没有任何意义,除非我编写了每个进程,以便所有状态始终在gen_server内部完成,并且gen_server调用的所有外部模块都必须尽可能简单。

However as I was writing code I realized that hot code swapping doesn't make any sense unless I write every process so that all state is always done inside a gen_server, and all external modules that gen_server calls must be as simple as possible.

让我们以以下示例为例:

Let's take the following example:

-module(server_template).
-behaviour(gen_server).

-export([start/1, stop/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2, terminate/2, code_change/3]).

start() -> gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).

init([]) -> {ok, {module1:new(), module2:new()}}.

handle_call(Message, From, State) -> {reply, ok, State}.

handle_cast(any_message, {state1, state2}) -> 
    new_state1 = module1:do_something(state1),
    new_state2 = module2:do_something(state2),
    {noreply, {new_state1, new_state2}}.

handle_info(_Message, _Server) -> {noreply, _Server}.

terminate(_Reason, _Server) -> ok.

code_change(_OldVersion, {state1, state2}, _Extra) -> 
    new_state1 = module1:code_change(state1),
    new_state2 = module2:code_change(state2)
    {ok, {new_state1, new_state2}}

根据我的发现,当在不使用OTP系统的情况下将新版本的代码加载到当前运行的运行时中时,可以升级到当前代码通过将模块作为外部函数调用来调用版本,因此 my_module:loop(state)

According to what I could find, when a new version of code is loaded into the currently running runtime without using an OTP system, you can upgrade into the current code version by calling your module as an external function call, so my_module:loop(state).

我也看到的是,在执行热交换时,会调用 code_change / 3 函数并升级状态,因此我可以使用它来确保每个依赖模块都可以迁移最后一个说明他们让我了解了当前代码版本。这样做是因为主管知道正在运行的进程,这允许挂起该进程,以便可以调用代码更改功能。一切都很好。

What I also see is that when a hot swap is performed the code_change/3 function is called and upgrades the state, so I can use that to make sure each of my dependent modules migrates the last state they gave me into state for the current code version. It does this because the supervisor knows about the running process, which allows the process to be suspended so it can call the code change function. All good.

但是,如果调用外部模块总是调用该模块的当前版本,那么如果在功能中完成热交换,这似乎会中断。例如,我的gen_server当前正在处理 any_message 强制转换的过程,例如在运行 module1:do_something() module2:do_something()

However, if calling an external module always calls the current version of that module then this would seem to break if a hot swap is done mid-function. For example, same my gen_server is currently in the process of handling the any_message cast, say in between running module1:do_something() and module2:do_something().

如果我理解正确的话, module2:do_something()现在会调用 do_something 函数,这可能意味着我要将未迁移的数据传递到新版本的 module2:do_something()中。如果记录已更改,元素数量异常的数组,或者地图缺少代码期望的值,那么这很容易引起问题。

If I am understanding things correctly, module2:do_something() would now call the newly current version of the do_something function, which could potentially mean I'm passing in unmigrated data into the new version of module2:do_something(). This would easily cause issues if it's a record that has changed, an array with an unexpected number of elements, or even if a map is missing a value that the code expects.

我是否误解了这种情况?如果正确,这似乎表明我必须跟踪可能转换模块边界的任何数据结构的某种类型的版本详细信息,并且每个公共功能必须检查该版本号并在必要时执行按需迁移。

Am I misunderstanding how this situation works? If this is right this seems to indicate that I must track some type of version details for any data structure that may transition module boundaries, and every public function must check that version number and perform an on demand migration if necessary.

这似乎是一个非常高的命令,似乎很容易出错,所以我想知道是否丢失了某些东西。

That seems to be an extremely tall order that seems crazily error prone, so I am wondering if I am missing something.

推荐答案

是的,您完全正确。没有人说热代码交换很容易。我曾在一家电信公司工作,该公司的所有代码升级都是在实时系统上进行的(以确保用户在通话过程中不会断开连接)。正确地执行操作意味着认真考虑您提到的所有方案,并为每个失败准备代码,然后进行测试,修复问题,测试等等。为了正确测试它,您将需要一个系统在负载下运行旧版本(例如,在测试环境中),然后部署新代码并检查是否崩溃。

Yes, you are exactly right. No one said hot code swapping is easy. I worked for a telecommunication company where all code upgrades were performed on a live system (so that users aren't disconnected in the middle of their calls). Doing it right means carefully considering all those scenarios that you mentioned and preparing the code for every failure, then testing, then fixing issues, testing, and so on. To test it properly you would need a system running the old version under load (e.g. in a testing environment), then deploying the new code and checking for any crashes.

在您的问题中提到的这个特定示例,处理此问题的最简单方法是编写两个版本的 module2:do_something / 1 ,一个版本接受旧状态,另一个版本接受新状态。 。然后相应地处理旧状态,例如将其转换为新状态。

In this particular example mentioned in your question the simplest way of dealing with this issue is writing two versions of module2:do_something/1, one accepting the old state and one accepting the new state. Then dealing with the old state accordingly, e.g. converting it to the new state.

要执行此操作,您还需要确保新版本的 module2 会在任何模块有机会以新状态调用它之前进行部署:

For this to work you will also need to ensure that the new version of module2 is deployed before any module has a chance to call it with the new state:


  1. 如果应用程序包含 module2 是另一个应用程序的依赖项, release_handler 将首先升级该模块。

  1. If the application containing module2 is a dependency of the other application release_handler will upgrade that module first.

否则,您可能需要将部署分为两部分,首先升级公用功能,以便它们可以处理新状态,然后部署新版本的 gen_servers 和其他调用 module2 的模块。

Otherwise, you may need to split the deployment into two parts, firstly upgrading the common functions so that they can handle the new state, then deploying new versions of gen_servers and other modules that make calls to module2.

如果不使用发布处理程序,则可以手动指定模块的加载顺序。

If you are not using the release handler you could manually specify in which order the modules are loaded.

这也是在Erlang中建议避免有趣的循环依赖模块之间的ction调用,例如当 modA 调用 modB 中的函数时,该函数调用 modA

This is also the reason why in Erlang it's advised to avoid circular dependencies in function calls between modules, e.g. when modA calls a function in modB which calls another function in modA.

要在发布处理程序的帮助下进行升级,您可以验证 release_handler 将在 relup 文件,由 release_handler 根据新旧版本。这是一个包含所有升级说明的文本文件,例如:删除(以删除模块), load_object_code (加载新模块),加载清除

For upgrades performed with the help of release handler you can verify the order in which release_handler will upgrade modules on the old system in the relup file that the release_handler generates based on the old and new release. It's a text file containing all instructions for the upgrade, e.g.: remove (to remove modules), load_object_code (load new module), load, purge, etc.

请注意,没有严格要求所有应用程序都必须遵循OTP原则才能进行热代码交换,但是必须使用 gen_server 和正确的管理程序堆栈使开发人员和发布处理程序都更容易处理此任务。

Please note that there is no strict requirement that all applications must follow OTP principles for the hot code swapping to work, however using gen_server's and a proper supervisor stack makes this task much easier to handle for both, the developer and the release handler.

如果不使用OTP版本,则无法使用版本处理程序进行升级,但仍可以在系统上强行重新加载模块并将其升级到新版本。只要您不需要添加/删除Erlang应用程序,此方法就可以正常工作,因为为此,发行版本定义将需要更改,并且在没有发行处理程序支持的情况下,不能在实时系统上完成。

If you are not using OTP release you can't upgrade using the release handler, but you can still forcefully reload modules on your system and upgrade them to the new version. This works fine as long as you don't need to add/remove Erlang applications, because for that the release definition would need to change, and that can't be done on a live system without the support from the release handler.

这篇关于活动期间,Erlang热代码交换如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆