如何设计基于Erlang / OTP的分布式容错多核系统的架构? [英] How do you design the architecture of an Erlang/OTP-based distributed fault-tolerant multicore system?

查看:137
本文介绍了如何设计基于Erlang / OTP的分布式容错多核系统的架构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我已经阅读/撇去了:

>


  • 了解一些Erlang;

  • 编程Erlang(阿姆斯特朗);

  • Erlang编程(Cesarini);

  • Erlang / OTP in Action。



得到过程,消息传递,主管,gen_servers,日志记录等等的要点。



我明白某些体系结构的选择取决于应用程序的关注,但我仍然希望知道ERlang / OTP系统设计的一些一般原则。



我应该从几个gen_servers与一个主管开始,并逐步建立?



我应该有多少主管?如何确定系统的哪些部分应以流程为基础?我应该如何避免瓶颈?



我应该稍后添加日志吗?



Erlang / OTP分布式容错多处理器系统架构的一般方法是什么?

解决方案

我应该从几个gen_servers与一个主管开始,并逐步建立在这个?



你没有Erlang架构中的一个关键组件在这里:应用程序! (即,OTP应用程序的概念,而不是软件应用程序)。



将应用程序视为组件。系统中的组件解决了一个特定的问题,负责一系列连贯的资源或抽象系统中重要或复杂的东西。



设计Erlang时的第一步系统是决定需要哪些应用程序。有些可以从网络中拉出来,这些我们可以称为库。其他你需要写自己(否则你不需要这个特定的系统)。我们通常将这些应用程序称为业务逻辑(通常您还需要自己编写一些库,但是保持库和将所有内容结合在一起的核心业务应用程序之间的区别很有用)。



我应该有多少主管?



您应该有一个主管用于您要监视的每种进程。一批相同的临时工?一个主管统治他们。



不同的过程有不同的责任和重新启动的策略?每个不同类型的进程的主管,在正确的层次结构中(取决于什么时候应该重新启动,还有什么其他过程需要与他们一起下去?)。



有时它可以在同一个主管下放置一堆不同的进程类型。通常情况下,您将始终运行一些单例过程(例如一个HTTP服务器主管,一个ETS表所有者进程,一个统计数据收集器)。在这种情况下,每个都有一个主管可能太多了,所以添加一个主管是很常见的。只要知道在执行此操作时使用特定的重新启动策略的含义,因此您不会占用统计过程,例如,如果您的Web服务器崩溃( one_for_one 是在这种情况下最常用的策略)。



如何决定系统的哪些部分应该基于进程?



系统中的每个并发活动都应该是自己的进程。错误的并发抽象是Erlang系统设计人员最开始的最常见的错误。



有些人不习惯于处理并发;他们的系统倾向于太少。一个过程,或几个巨大的,顺序运行的一切。这些系统通常充满代码气味,代码非常刚性,难以重构。它也使它们变慢,因为它们可能不会使用Erlang可用的所有核心。



其他人立即掌握并发概念,但不能最佳地应用它们;他们的系统倾向于过度使用流程概念,使许多流程保持闲置,等待正在工作的其他人。这些系统往往是不必要的复杂和难以调试。



从本质上说,在这两种变体中,您都会遇到同样的问题,您不会使用所有可用的并发并且您没有获得超出系统的最大性能。



如果您坚持单一责任原则,并遵守规则,为每个实现并发程序



我应该如何避免瓶颈?



很难说,在很大程度上取决于你的系统和它在做什么。一般来说,如果您在应用程序之间有很好的分工责任,您应该能够将与瓶颈的其他部分分开显示的应用程序缩放。



这里的黄金规则是衡量,衡量,测量!不要以为你有改进的东西,直到你测量。



Erlang是伟大的,它允许你隐藏在接口后面的并发(称为隐含并发)。例如,您使用功能模块API,一个通常的模块:function(Arguments)接口,这可以反过来产生数千个进程,而不需要调用者知道。如果您的抽象和API正确,您可以在开始使用图书馆之后随时并行化或优化图书馆。



这就是一般的指南行:




  • 尝试直接向收件人发送消息,避免通过中间流程传递或路由邮件。否则系统只会花时间移动消息(数据),而不需要真正的工作。

  • 不要过度使用OTP设计模式,如gen_servers。在许多情况下,您只需要启动一个进程,运行一些代码,然后退出。为此,一个gen_server是过度的。



一个奖励建议:不要重用进程。在Erlang中产生一个过程是如此便宜和快速,一旦生命结束,重新使用过程就没有意义。在某些情况下,重新使用状态(例如,文件的复杂解析)可能是有意义的,但是更好地经常存储在别的地方(在ETS表,数据库等中)。



以后应该添加日志吗?



已经有Erlang / OTP中的一些基本的日志功能,错误记录器。与 SASL (系统架构支持库)一起,您可以启动并运行,不用登录-time。



当时间到了(如果你从一开始就抽象了记录API),你可以交换更好的方式来满足你的需要。今天的事实上的第三方日志库是 Basho的Lager



Erlang / OTP分布式容错多处理器系统架构的一般方法是什么?



总结上述内容:




  • 将系统划分为应用程序

  • 根据需要和依赖关系,将您的进程置于正确的监督层级中>
  • 为您的系统中的每个真正的并发活动提供一个进程。

  • 为系统中的其他组件维护一个功能API。这可以让你:


    • 重新确定你的代码,而不用改变使用它的代码

    • 以后优化代码

    • 在需要时分配系统(只需调用API后面的另一个节点!呼叫者不会注意到!)

    • 更容易地测试代码工作设置测试工具,更容易理解如何使用它)


  • 开始使用OTP中可用的库,直到需要某些东西不同的(你知道的时候到了)



常见的陷阱:




  • 进程过多

  • 进程过少

  • 路由过多(转发消息,链接进程) li>
  • 应用程序太少(实际上我从未见过相反的情况)

  • 没有足够的抽象(使得很难重构和推理,使其难以测试!)


I would like to build an Erlang/OTP-based system which solves an 'embarassingly parrallel' problem.

I have already read/skimmed through:

  • Learn You Some Erlang;
  • Programming Erlang (Armstrong);
  • Erlang Programming (Cesarini);
  • Erlang/OTP in Action.

I have got the gist of Processes, Messaging, Supervisors, gen_servers, Logging, etc.

I do understand that certain architecture choices depend on the application in concern, but still I would like know some general principles of ERlang/OTP system design.

Should I just start with a few gen_servers with a supervisor and incrementally build on that?

How many supervisors should I have? How do I decide which parts of the system should be process-based? How should I avoid bottlenecks?

Should I add logging later?

What is the general approach to Erlang/OTP distributed fault-tolerant multiprocessors systems architecture?

解决方案

Should I just start with a few gen_servers with a supervisor and incrementally build on that?

You're missing one key component in Erlang architectures here: applications! (That is, the concept of OTP applications, not software applications).

Think of applications as components. A component in your system solves a particular problem, is responsible for a coherent set of resources or abstract something important or complex from the system.

The first step when designing an Erlang system is to decide which applications are needed. Some can be pulled from the web as they are, these we can refer to as libraries. Others you'll need to write yourself (otherwise you wouldn't need this particular system). These applications we usually refer to as the business logic (often you need to write some libraries yourself as well, but it is useful to keep the distinction between the libraries and the core business applications that tie everything together).

How many supervisors should I have?

You should have one supervisor for each kind of process you want to monitor.

A bunch of identical temporary workers? One supervisor to rule them all.

Different process with different responsibilities and restart strategies? A supervisor for each different type of process, in a correct hierarchy (depending on when things should restart and what other process needs to go down with them?).

Sometimes it is okay to put a bunch of different process types under the same supervisor. This is usually the case when you have a few singleton processes (e.g. one HTTP server supervisor, one ETS table owner process, one statistics collector) that will always run. In that case, it might be too much cruft to have one supervisor for each, so it is common to add the under one supervisor. Just be aware of the implications of using a particular restart strategy when doing this, so you don't take down your statistics process for example, in case your web server crashes (one_for_one is the most common strategy to use in cases like this).

How do I decide which parts of the system should be process-based?

Every concurrent activity in your system should be in it's own process. Having the wrong abstraction of concurrency is the most common mistake by Erlang system designers in the beginning.

Some people are not used to deal with concurrency; their systems tend to have too little of it. One process, or a few gigantic ones, that runs everything in sequence. These systems are usually full of code smell and the code is very rigid and hard to refactor. It also makes them slower, because they may not use all the cores available to Erlang.

Other people immediately grasp the concurrency concepts but fail to apply them optimally; their systems tend to overuse the process concept, making many process stay idle waiting for others that are doing work. These systems tend to be unnecessarily complex and hard to debug.

In essence, in both variants you get the same problem, you don't use all the concurrency available to you and you don't get the maximum performance out of the system.

If you stick to the single responsibility principle and abide by the rule to have a process for every truly concurrent activity in your system, you should be okay.

How should I avoid bottlenecks?

Hard to say, depends very much on your system and what it's doing. Generally though, if you have a good division of responsibility between applications you should be able to scale the application that appears to be the bottleneck separately from the rest of the system.

The golden rule here is to measure, measure, measure! Don't think you have something to improve until you've measured.

Erlang is great in that it allows you to hide concurrency behind interfaces (known as implicit concurrency). For example, you use a functional module API, a normal module:function(Arguments) interface, that could in turn spawn thousands of processes without the caller having to know that. If you got your abstractions and your API right, you can always parallelize or optimize a library after you've started using it.

That being said, here are some general guide lines:

  • Try to send messages to the recipient directly, avoid channeling or routing messages through intermediary processes. Otherwise the system just spends time moving messages (data) around without really working.
  • Don't overuse the OTP design patterns, such as gen_servers. In many cases, you only need to start a process, run some piece of code, and then exit. For this, a gen_server is overkill.

And one bonus advice: don't reuse processes. Spawning a process in Erlang is so cheap and quick that it doesn't make sense to re-use a process once its lifetime is over. In some cases it might make sense to re-use state (e.g. complex parsing of a file) but that is better canonically stored somewhere else (in an ETS table, database etc.).

Should I add logging later?

There's some basic logging functionality in Erlang/OTP already, the error logger. Together with SASL (System Architecture Support Libraries) you can get up and running with logging in no-time.

When the time comes (and if you've abstracted the logging API from the beginning) you could exchange this for something that better fits your needs. The de-facto 3rd party logging library today is Basho's Lager.

What is the general approach to Erlang/OTP distributed fault-tolerant multiprocessors systems architecture?

To summarize what's been said above:

  • Divide your system into applications
  • Put your processes in the correct supervision hierarchy, depending on their needs and dependencies
  • Have a process for every truly concurrent activity in your system
  • Maintain a functional API towards the other components in the system. This lets you:
    • Refactor your code without changing the code that's using it
    • Optimize code afterwards
    • Distribute your system when needed (just make a call to another node behind the API! The caller won't notice!)
    • Test the code more easily (less work setting up test harnesses, easier to understand how to use it)
  • Start using the libraries available to you in OTP until you need something different (you'll know, when the time comes)

Common pitfalls:

  • Too many processes
  • Too few processes
  • Too much routing (forwarded messages, chained processes)
  • Too few applications (I've never seen the opposite case, actually)
  • Not enough abstraction (makes it hard to refactor and reason about. It also makes it hard to test!)

这篇关于如何设计基于Erlang / OTP的分布式容错多核系统的架构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆