PHP中分布式系统的解剖 [英] Anatomy of a Distributed System in PHP

查看:175
本文介绍了PHP中分布式系统的解剖的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,让我很难想出这个理想的解决方案,为了更好地解释这个问题,我将在这里公开我的场景。


我有一个服务器将从几个客户端接收订单
。每个客户端将
提交一组循环任务,
应该按照指定的
间隔执行,例如: 客户端提交任务
应该在2009-12-31和
之间每
分钟执行2010-12-31
;所以如果我的数学是正确的
这是一个
年约525 600次操作,给予更多的客户和任务
让服务器处理所有这些任务是不可行的所以我
想出了工人
机器的想法。服务器将在PHP上开发



工作机器只是经常便宜的
基于Windows的计算机,我
每个工作人员将有一个专门的
互联网连接(动态IP
和一个UPS以避免停电。每个
工作人员还将通过Web服务调用
查询服务器每
30秒,
获取下一个待处理的作业并处理它。
工作完成后,工作人员将
将输出提交到服务器,并请求
一个新的工作等等。如果
有需要扩展系统我
应该只是设置一个新的工作和
整个事情应该无缝运行。
工作客户端将在PHP或Python中开发



在任何给定的时间,我的客户端应该是
可以登录到服务器,并检查
他们订购的任务的状态。


现在这里是棘手的部分踢:




  • 我必须能够重建
    已处理的任务,如果某些
    的原因是服务器

  • 工作人员不是客户特定的,
    一个工作人员应处理任何给定数量的客户端的
    的工作。 strong>



我对一般数据库设计和使用哪些技术有疑问。 p>

本来我以为是使用几个 SQLite 数据库和将它们全部加入到服务器上,但是我无法弄清楚我将如何分组客户端生成工作报告



我从未使用过任何以下技术: memcached > CouchDB Hadoop 等等,但我想知道这些是否适合我的问题,如果是的,你推荐一个新手是分布式计算(或者是这样平行吗?)像我一样。 请记住,工作人员具有动态IP。



像我之前说过的那样,我也遇到了一般的数据库设计问题,部分因为我还没有选择任何特定的R(D)DBMS,而是我遇到的一个问题,我认为这是不可知的,我选择的是与排队系统相关的... / strong>将所有绝对时间戳预先计算为特定作业,并具有大量时间戳,执行并按升序将其标记为完整,或者我应该有一个更聪明的系统,如 时间戳模数60 == 0 - >执行 。这个聪明系统的问题是,一些工作不会按照顺序执行,因为一些工作人员可能在等待什么也不做任何事情,而其他工作重载。 你建议什么?



PS:我不知道这个问题的标题和标签是否正确反映了我的问题,我正在做什么如果没有,请相应地进行编辑。



感谢您的输入!



@timdev :


  1. 输入将是一个非常小的JSON编码字符串,输出也将是一个JSON包含的字符串,但有点较大的(大约1-5 KB)。

  2. 将使用Web中的几个可用资源计算输出,因此主要瓶颈可能是带宽。数据库写入也可以是一个 - 取决于R(D)DBMS。


解决方案

看起来你正在重新创建 Gearman 的边缘。以下是Gearman的介绍:


Gearman提供了一个通用应用程序
框架,以便将工作投放到其他
机器或更好的
的流程适合做这项工作。它允许
并行工作,加载
余额处理,以及在语言之间调用
函数。可以在各种应用程序中使用

从高可用性网站到
传输数据库复制
事件。换句话说,这是
神经系统,分配的
处理通信。


你可以写两个您的客户端和PHP中的后端工作人员代码。






重新询问有关为Windows编译的Gearman Server的问题:I不要以为在Windows预先构建的整齐的包中可以使用它。 Gearman仍然是一个相当年轻的项目,他们可能还没有成熟到为Windows生成即将发行的发行版。



Sun / MySQL员工Eric Day和Brian Aker于2009年7月在OSCON发布了Gearman的教程,但其幻灯片仅提及Linux软件包。



以下是Perl CPAN测试程序项目的链接,表明Gearman-Server可以使用Microsoft C编译器( cl.exe ),并通过测试: http://www.nntp.perl.org/group/perl.cpan.testers/2009/10/msg5521569.html 但是我猜你必须下载源码代码并自行构建。


I've a problem which is giving me some hard time trying to figure it out the ideal solution and, to better explain it, I'm going to expose my scenario here.

I've a server that will receive orders from several clients. Each client will submit a set of recurring tasks that should be executed at some specified intervals, eg.: client A submits task AA that should be executed every minute between 2009-12-31 and 2010-12-31; so if my math is right that's about 525 600 operations in a year, given more clients and tasks it would be infeasible to let the server process all these tasks so I came up with the idea of worker machines. The server will be developed on PHP.

Worker machines are just regular cheap Windows-based computers that I'll host on my home or at my workplace, each worker will have a dedicated Internet connection (with dynamic IPs) and a UPS to avoid power outages. Each worker will also query the server every 30 seconds or so via web service calls, fetch the next pending job and process it. Once the job is completed the worker will submit the output to the server and request a new job and so on ad infinitum. If there is a need to scale the system I should just set up a new worker and the whole thing should run seamlessly. The worker client will be developed in PHP or Python.

At any given time my clients should be able to log on to the server and check the status of the tasks they ordered.

Now here is where the tricky part kicks in:

  • I must be able to reconstruct the already processed tasks if for some reason the server goes down.
  • The workers are not client-specific, one worker should process jobs for any given number of clients.

I've some doubts regarding the general database design and which technologies to use.

Originally I thought of using several SQLite databases and joining them all on the server but I can't figure out how I would group by clients to generate the job reports.

I've never actually worked with any of the following technologies: memcached, CouchDB, Hadoop and all the like, but I would like to know if any of these is suitable for my problem, and if yes which do you recommend for a newbie is "distributed computing" (or is this parallel?) like me. Please keep in mind that the workers have dynamic IPs.

Like I said before I'm also having trouble with the general database design, partly because I still haven't chosen any particular R(D)DBMS but one issue that I've and I think it's agnostic to the DBMS I choose is related to the queuing system... Should I precalculate all the absolute timestamps to a specific job and have a large set of timestamps, execute and flag them as complete in ascending order or should I have a more clever system like "when timestamp modulus 60 == 0 -> execute". The problem with this "clever" system is that some jobs will not be executed in order they should be because some workers could be waiting doing nothing while others are overloaded. What do you suggest?

PS: I'm not sure if the title and tags of this question properly reflect my problem and what I'm trying to do; if not please edit accordingly.

Thanks for your input!

@timdev:

  1. The input will be a very small JSON encoded string, the output will also be a JSON enconded string but a bit larger (in the order of 1-5 KB).
  2. The output will be computed using several available resources from the Web so the main bottleneck will probably be the bandwidth. Database writes may also be one - depending on the R(D)DBMS.

解决方案

It looks like you're on the verge of recreating Gearman. Here's the introduction for Gearman:

Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.

You can write both your client and the back-end worker code in PHP.


Re your question about a Gearman Server compiled for Windows: I don't think it's available in a neat package pre-built for Windows. Gearman is still a fairly young project and they may not have matured to the point of producing ready-to-run distributions for Windows.

Sun/MySQL employees Eric Day and Brian Aker gave a tutorial for Gearman at OSCON in July 2009, but their slides mention only Linux packages.

Here's a link to the Perl CPAN Testers project, that indicates that Gearman-Server can be built on Win32 using the Microsoft C compiler (cl.exe), and it passes tests: http://www.nntp.perl.org/group/perl.cpan.testers/2009/10/msg5521569.html But I'd guess you have to download source code and build it yourself.

这篇关于PHP中分布式系统的解剖的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆