你如何设计基于 Erlang/OTP 的分布式容错多核系统的架构? [英] How do you design the architecture of an Erlang/OTP-based distributed fault-tolerant multicore system?

查看:30
本文介绍了你如何设计基于 Erlang/OTP 的分布式容错多核系统的架构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想构建一个基于 Erlang/OTP 的系统来解决令人尴尬的并行"问题.

I would like to build an Erlang/OTP-based system which solves an 'embarassingly parrallel' problem.

我已经阅读/略读了:

  • 向你学习一些 Erlang;
  • Erlang 编程(阿姆斯壮);
  • Erlang 编程 (Cesarini);
  • Erlang/OTP 在行动.

我已经掌握了流程、消息传递、主管、gen_servers、日志记录等的要点

I have got the gist of Processes, Messaging, Supervisors, gen_servers, Logging, etc.

我确实理解某些架构选择取决于所关注的应用程序,但我仍然想知道 ERlang/OTP 系统设计的一些一般原则.

I do understand that certain architecture choices depend on the application in concern, but still I would like know some general principles of ERlang/OTP system design.

我应该从几个带有主管的 gen_servers 开始,然后逐步构建吗?

Should I just start with a few gen_servers with a supervisor and incrementally build on that?

我应该有多少个主管?我如何决定系统的哪些部分应该基于流程?我应该如何避免瓶颈?

How many supervisors should I have? How do I decide which parts of the system should be process-based? How should I avoid bottlenecks?

我应该稍后添加日志记录吗?

Should I add logging later?

Erlang/OTP 分布式容错多处理器系统架构的一般方法是什么?

推荐答案

我应该从几个带有主管的 gen_servers 开始,然后逐步构建吗?

您在这里缺少 Erlang 架构中的一个关键组件:应用程序!(即OTP应用的概念,不是软件应用).

Should I just start with a few gen_servers with a supervisor and incrementally build on that?

You're missing one key component in Erlang architectures here: applications! (That is, the concept of OTP applications, not software applications).

将应用程序视为组件.系统中的一个组件解决了一个特定的问题,负责一组连贯的资源,或者从系统中抽象出一些重要或复杂的东西.

Think of applications as components. A component in your system solves a particular problem, is responsible for a coherent set of resources or abstract something important or complex from the system.

设计 Erlang 系统的第一步是决定需要哪些应用程序.有些可以按原样从网上提取,我们可以将这些称为库.其他你需要自己编写(否则你就不需要这个特定的系统).我们通常将这些应用程序称为业务逻辑(通常您还需要自己编写一些库,但保持库与将所有内容联系在一起的核心业务应用程序之间的区别会很有用).

The first step when designing an Erlang system is to decide which applications are needed. Some can be pulled from the web as they are, these we can refer to as libraries. Others you'll need to write yourself (otherwise you wouldn't need this particular system). These applications we usually refer to as the business logic (often you need to write some libraries yourself as well, but it is useful to keep the distinction between the libraries and the core business applications that tie everything together).

对于要监控的每种流程,您都应该有一名主管.

You should have one supervisor for each kind of process you want to monitor.

一堆一模一样的临时工?一位上司统管他们.

A bunch of identical temporary workers? One supervisor to rule them all.

不同的流程,不同的职责和重启策略?每个不同类型进程的主管,在正确的层次结构中(取决于什么时候应该重新启动以及哪些其他进程需要与它们一起关闭?).

Different process with different responsibilities and restart strategies? A supervisor for each different type of process, in a correct hierarchy (depending on when things should restart and what other process needs to go down with them?).

有时将一堆不同的流程类型放在同一个主管下是可以的.当您有几个始终运行的单例进程(例如,一个 HTTP 服务器主管、一个 ETS 表所有者进程、一个统计信息收集器)时,通常就是这种情况.在这种情况下,为每个主管配备一名主管可能过于繁琐,因此通常会添加一位主管.在执行此操作时,请注意使用特定重启策略的含义,因此您不会取消统计过程,例如,万一您的 Web 服务器崩溃(one_for_one 是最常见的策略)在这种情况下使用).注意不要在 one_for_one 主管中的进程之间有任何依赖关系.如果一个进程依赖于另一个崩溃的进程,它也会崩溃,触发监督者的重启强度过于频繁,监督者本身也会过早崩溃.这可以通过使用两个不同的主管来避免,这将通过配置的强度和周期完全控制重启(更长的解释).

Sometimes it is okay to put a bunch of different process types under the same supervisor. This is usually the case when you have a few singleton processes (e.g. one HTTP server supervisor, one ETS table owner process, one statistics collector) that will always run. In that case, it might be too much cruft to have one supervisor for each, so it is common to add the under one supervisor. Just be aware of the implications of using a particular restart strategy when doing this, so you don't take down your statistics process for example, in case your web server crashes (one_for_one is the most common strategy to use in cases like this). Be careful not to have any dependencies between processes in a one_for_one supervisor. If a process depends on another crashed process, it can crash as well, triggering the supervisors' restart intensity too often and crash the supervisor itself too soon. This can be avoided by having two different supervisors, which would completely control the restarts by the configured intensity and period (longer explanation).

您系统中的每个并发活动都应该在它自己的进程中.错误的并发抽象是 Erlang 系统设计者最开始犯的错误.

Every concurrent activity in your system should be in it's own process. Having the wrong abstraction of concurrency is the most common mistake by Erlang system designers in the beginning.

有些人不习惯处理并发;他们的系统往往有太少.一个进程,或几个巨大的进程,按顺序运行所有内容.这些系统通常充满了代码异味,并且代码非常死板且难以重构.这也使它们变慢,因为它们可能不会使用 Erlang 可用的所有内核.

Some people are not used to deal with concurrency; their systems tend to have too little of it. One process, or a few gigantic ones, that runs everything in sequence. These systems are usually full of code smell and the code is very rigid and hard to refactor. It also makes them slower, because they may not use all the cores available to Erlang.

其他人立即掌握并发概念,但未能最佳应用;他们的系统倾向于过度使用进程概念,使许多进程保持空闲等待其他正在工作的进程.这些系统往往过于复杂且难以调试.

Other people immediately grasp the concurrency concepts but fail to apply them optimally; their systems tend to overuse the process concept, making many process stay idle waiting for others that are doing work. These systems tend to be unnecessarily complex and hard to debug.

本质上,在这两种变体中,您会遇到相同的问题,您没有使用所有可用的并发性,也没有从系统中获得最大性能.

In essence, in both variants you get the same problem, you don't use all the concurrency available to you and you don't get the maximum performance out of the system.

如果你坚持单一责任原则 并遵守规则,为系统中的每个真正并发活动制定一个流程,您应该没问题.

If you stick to the single responsibility principle and abide by the rule to have a process for every truly concurrent activity in your system, you should be okay.

存在空闲进程的正当理由.有时他们保持重要状态,有时你想暂时保留一些数据然后丢弃进程,有时他们等待外部事件.更大的陷阱是通过一长串大部分不活动的进程传递重要消息,因为它会因大量复制而降低系统速度并使用更多内存.

There are valid reasons to have idle processes. Sometimes they keep important state, sometimes you want to keep some data temporarily and later discard the process, sometimes they wait on external events. The bigger pitfall is to pass important messages through a long chain of largely inactive processes, as it will slow down your system with lots of copying and use more memory.

很难说,这在很大程度上取决于您的系统及其正在执行的操作.不过,通常情况下,如果您在应用程序之间有良好的职责分工,您应该能够将似乎是瓶颈的应用程序与系统的其余部分分开进行扩展.

Hard to say, depends very much on your system and what it's doing. Generally though, if you have a good division of responsibility between applications you should be able to scale the application that appears to be the bottleneck separately from the rest of the system.

这里的黄金法则是测量,测量,再测量!在衡量之前,不要认为自己有什么需要改进的地方.

The golden rule here is to measure, measure, measure! Don't think you have something to improve until you've measured.

Erlang 很棒,因为它允许您将并发隐藏在接口后面(称为隐式并发).例如,您使用一个功能模块 API,一个普通的 module:function(Arguments) 接口,它可以反过来产生数千个进程,而调用者不必知道.如果您的抽象和 API 正确,您就可以在开始使用库后对其进行并行化或优化.

Erlang is great in that it allows you to hide concurrency behind interfaces (known as implicit concurrency). For example, you use a functional module API, a normal module:function(Arguments) interface, that could in turn spawn thousands of processes without the caller having to know that. If you got your abstractions and your API right, you can always parallelize or optimize a library after you've started using it.

话虽如此,以下是一些通用指南:

That being said, here are some general guide lines:

  • 尽量直接向接收者发送消息,避免通过中间过程引导或路由消息.否则系统只会花时间移动消息(数据)而没有真正工作.
  • 不要过度使用 OTP 设计模式,例如 gen_servers.在很多情况下,你只需要启动一个进程,运行一段代码,然后退出.为此,gen_server 是多余的.

还有一个额外的建议:不要重复使用流程.在 Erlang 中生成一个进程既便宜又快速,以至于一旦进程生命周期结束就重新使用它是没有意义的.在某些情况下,重用状态(例如文件的复杂解析)可能是有意义的,但最好规范地存储在其他地方(在 ETS 表、数据库等中).

And one bonus advice: don't reuse processes. Spawning a process in Erlang is so cheap and quick that it doesn't make sense to re-use a process once its lifetime is over. In some cases it might make sense to re-use state (e.g. complex parsing of a file) but that is better canonically stored somewhere else (in an ETS table, database etc.).

您现在应该添加日志记录!有一个很棒的内置 API 叫做 Logger 来自版本的 Erlang/OTP21:

You should add logging now! There's a great built-in API called Logger that comes with Erlang/OTP from version 21:

logger:error("The file does not exist: ~ts",[Filename]),
logger:notice("Something strange happened!"),
logger:debug(#{got => connection_request, id => Id, state => State},
             #{report_cb => fun(R) -> {"~p",[R]} end}),

这个新 API 具有多项高级功能,应该涵盖您需要记录的大多数情况.还有较旧但仍被广泛使用的第 3 方库 Lager.

This new API has several advanced features and should cover most cases where you need logging. There's also the older but still widely used 3rd party library Lager.

总结一下上面所说的:

  • 将您的系统划分为多个应用程序
  • 根据流程的需求和依赖关系,将流程置于正确的监督层次结构中
  • 为系统中每个真正并发的活动制定一个流程
  • 维护面向系统中其他组件的功能性 API.这让您:
    • 重构您的代码而不更改使用它的代码
    • 事后优化代码
    • 在需要时分发您的系统(只需调用 API 后面的另一个节点!调用者不会注意到!)
    • 更轻松地测试代码(设置测试工具的工作更少,更容易理解如何使用它)

    常见的陷阱:

    • 进程过多
    • 进程太少
    • 路由过多(转发的消息、链接的进程)
    • 应用程序太少(实际上我从未见过相反的情况)
    • 抽象不够(难以重构和推理.也难以测试!)

    这篇关于你如何设计基于 Erlang/OTP 的分布式容错多核系统的架构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆