创建同事的Windows 7 PC集群以在R?中进行并行处理. [英] Create a cluster of co-workers' Windows 7 PCs for parallel processing in R?

查看:60
本文介绍了创建同事的Windows 7 PC集群以在R?中进行并行处理.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在针对5个不同国家的10年每日债券价格数据运行R中的termtrc收益率曲线分析软件包.这是高度计算密集型的,在一个标准的Lapply上,每个国家/地区需要3200秒,如果我在2009 i7 mac上使用foreach和%dopar%(带有doSNOW),则使用全部4个核心(其中8个具有超线程) 850秒.每当我添加一个国家(以计算国家间价差)时,我都需要重新运行此分析,而且我还有19个国家需要走,未来还会有更多的信用收益率曲线出现.花费的时间开始看起来像是一个主要问题.顺便说一句,有问题的termstrc分析函数是在R中访问的,但是用C语言编写的.

现在,我们是一家12人的小公司(预算有限),全部配备8GB ram,i7 PC,其中至少有一半用于平凡的文字处理/电子邮件/浏览风格的任务,也就是说,最多使用5%的效果.它们都使用千兆位(但不是10千兆位)以太网联网.

我可以使用MPI对这些未充分使用的PC进行群集,然后对它们进行R分析吗?网络会受到影响吗?收益曲线分析函数的每次迭代大约需要1.2秒,因此我假设如果并行处理的粒度是将整个函数的迭代传递给每个群集节点,那么与千兆以太网延迟相比,1.2秒应该是相当大的? /p>

可以做到吗?如何?这会对我的同事产生什么影响.他们可以在我给机器加税时继续阅读他们的电子邮件吗?

我注意到Open MPI似乎不再支持Windows,而MPICH似乎不再支持Windows.您将使用哪一个?

也许在每台PC上都运行Ubuntu虚拟机?

解决方案

可以.有很多方法.最简单的方法之一是使用 redis 作为后端(就像在Ubuntu计算机上调用sudo apt-get install redis-server一样容易;传闻您可能有一个redis在Windows计算机上也是后端).

通过使用 doRedis 程序包,您可以非常容易排队工作在redis中的任务队列上,然后使用一个,两个,...空闲工人查询队列.最重要的是,您可以轻松地混合使用操作系统,因此,您同事的Windows计算机符合要求.此外,您可以根据需要使用一个,两个,三个,...客户端,并按比例放大或缩小.队列不知道或不在乎,它只是提供作业.

最重要的是, doRedis 中的插图包含了Linux和Windows客户端混合使用的有效示例.使引导示例变得更快.

I am running the termstrc yield curve analysis package in R across 10 years of daily bond price data for 5 different countries. This is highly compute intensive, it takes 3200 seconds per country on a standard lapply, and if I use foreach and %dopar% (with doSNOW) on my 2009 i7 mac, using all 4 cores (8 with hyperthreading) I get this down to 850 seconds. I need to re-run this analysis every time I add a country (to compute inter-country spreads), and I have 19 countries to go, with many more credit yield curves to come in the future. The time taken is starting to look like a major issue. By the way, the termstrc analysis function in question is accessed in R but is written in C.

Now, we're a small company of 12 people (read limited budget), all equipped with 8GB ram, i7 PCs, of which at least half are used for mundane word processing / email / browsing style tasks, that is, using 5% maximum of their performance. They are all networked using gigabit (but not 10-gigabit) ethernet.

Could I cluster some of these underused PCs using MPI and run my R analysis across them? Would the network be affected? Each iteration of the yield curve analysis function takes about 1.2 seconds so I'm assuming that if the granularity of parallel processing is to pass a whole function iteration to each cluster node, 1.2 seconds should be quite large compared with the gigabit ethernet lag?

Can this be done? How? And what would the impact be on my co-workers. Can they continue to read their emails while I'm taxing their machines?

I note that Open MPI seems not to support Windows anymore, while MPICH seems to. Which would you use, if any?

Perhaps run an Ubuntu virtual machine on each PC?

解决方案

Yes you can. There are a number of ways. One of the easiest is to use redis as a backend (as easy as calling sudo apt-get install redis-server on an Ubuntu machine; rumor has that you could have a redis backend on a windows machine too).

By using the doRedis package, you can very easily en-queue jobs on a task queue in redis, and then use one, two, ... idle workers to query the queue. Best of all, you can easily mix operating systems so yes, your co-workers' windows machines qualify. Moreover, you can use one, two, three, ... clients as you see fit and need and scale up or down. The queue does not know or care, it simply supplies jobs.

Bost of all, the vignette in the doRedis has working examples of a mix of Linux and Windows clients to make a bootstrapping example go faster.

这篇关于创建同事的Windows 7 PC集群以在R?中进行并行处理.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆