对于R中的循环和计算速度 [英] For loops in R and computational speed

查看:72
本文介绍了对于R中的循环和计算速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

过去,我写过R代码,要求在for循环内使用for循环.通常,此代码执行起来非常耗时.我已经在线阅读了这是R中for循环如何工作的结果.我也读过R中另一种语言的for循环使用. C ++或Java,可以加快计算时间.

有人有经验吗,可以给我指出一些我可以阅读的简单例子吗?

您还可以为for循环调用另一种语言,但循环中的所有内容仍使用标准R代码编写吗?

解决方案

在做一个项目时,我确实有一些经验.在该项目中,有必要用C语言编写一些循环以加速代码. /p>

首先,值得注意的是,在Stackoverflow主站点上有很多有关R的for循环的信息.例如,问题加快R中的循环操作在至少有两个很好的答案,我觉得非常有帮助.另外,上面的RomanLuštrik所建议的R Inferno也有很多很好的建议.

假设您已经向量化了所有可以向量化的内容,请尽可能从循环内部删除它们,担心 R编程的艺术会更有帮助.链接是该书的pdf草稿;本书本身包含更详细的示例.实际上,我现在看到该示例未包含在上述pdf中.对不起.

无论如何,事实证明,有两种从R调用C的方式,分别称为.C.Call.许多人不建议使用.C,但是它的优点是易于使用,并且不鼓励使用.C的人也倾向于是核心程序员.

关于如何使用.C接口的示例,有许多在线教程,例如,使用数据建模克莱门斯(Klemens),这是一本统计资料教科书,使用C作为选择语言,并且没有任何先验知识.我发现它对于学习指针非常有帮助.

如果尝试使.C的示例开箱即用"运行,则有助于在UNIX环境中工作.我没有使用过其中一种,要在Windows上正常工作要困难得多.我感觉到很多人认为使用Windows是某种邪恶的行为,并且不愿意帮助使用Windows的人,如果您是恰好别无选择的数据分析师,这将是一个痛苦.或者这可能是不公平的,并且仅使用Windows的人们应该熟悉命令行.

我不想讨论让.C在Windows上运行的细节,以防万一您不使用它.我可以说的是,我对计算机一无所知,但是我确实做到了,所以可以做到.

.C或.Call的一种较新的替代方法. > Rcpp程序包.软件包作者之一Dirk Edelbuettel在Stackexchange上非常活跃,如果对这个软件包或任何其他R/C接口有疑问,很可能会为您提供帮助.正如RomanLuštrik上面所建议的那样,此软件包可能是一个非常好的选择.我还没有使用过它,因为我还不能在Windows下安装它.

关于您问题的最后一部分,问您是否可以用外语编写一个for循环,然后仅在循环内使用R代码,所以我很确定,不幸的是,没有很好的方法可以做到这一点. .如果您可以完全跳过必须完全使用R的for,那就太好了,但是我认为它不是这样工作的.但是,如果包含R.h头文件,则可以在C代码中使用各种R函数.同样,很难在Windows上运行它.特别是,您必须安装一个叫做Rtools的东西.但是一旦开始工作,编写一小段C代码几乎就像编写相应的R代码一样容易.

无论如何,我希望其中一些参考有所帮助.首先尝试的最佳选择是尽可能高效地编写R代码.接下来,尝试Rcpp.如果这不起作用,或者您做的是相当次要的事情,那么我建议您使用.C.我相信有些专家会提供更好的建议,但是我希望得到一个在这些问题上苦苦挣扎的非程序员的答案至少在某种程度上有用.

In the past I have written R code that requires for loops inside of for loops. Generally this code is rather time consuming to execute. I have read online this is a result of how for loops in R work. I have also read that using for loops in another language inside R e.g. C++ or Java, can speed up the computational time.

Does anyone have experience with this and can point me to some simple examples that I can read?

Also can you call another language for the for loop, but still have everything inside the loop be in standard R code?

解决方案

I do have some experience of this as I worked on a project in which it was necessary to write some loops in C in order to speed up the code.

First, it's useful to note that there is a lot of information about R's for loops on the main Stackoverflow site. For example, the question Speed up the Loop Operation in R has at least two excellent answers which I found very helpful. Also, the R Inferno as suggested by Roman Luštrik above has a lot of good advice.

Assuming that you've vectorised everything that can be vectorised, removed as much as possible from inside the loops, worried about the fact that ( is a function call, and so on, you are asking: what to do next?

(Aside: as I understand it, from asking questions on various sites, R is written in C, and almost everything you write in R is a function call at the C level. This means that if you are doing things over and over again, you should make sure that your code makes as few function calls as possible, as these can really add up, paricularly in a double for loop. That's why it's interesting that innocent-looking things like brackets are actually function calls.)

The first place you will be told to look when trying to extend R is the Writing R extensions manual. This hasn't worked out very well for me as it's not written with the casual R user in mind. Instead, I have found Matloff's book The Art of R Programming to be a lot more helpful. The link is to a pdf draft of the book; the book itself contains more detailed examples. In fact, I now see that the example is not included in the above pdf; sorry.

Anyway, it turns out that there are two ways to call C from R, called .C and .Call. Many people don't recommend using .C, but it has the advantage of being much easier to use, and the people who tend to discourage it also tend to be hard-core programmers.

There are numerous online tutorials on how to use the .C Interface with examples, for example this one from Simon Fraser University. Basically, you have to write the function you want to call in C, it has to have return type void and it has to accept pointers as arguments. I hadn't tried using C when I first started trying to learn this, and I learned what I needed to know from a book called C Programming in Easy Steps. Another good reference, which is available online for free, is the book Modeling with Data by Ben Klemens, which is a statistics textbook which uses C as the language of choice and assumes no prior knowledge. I found it very helpful for learning about pointers.

If you are trying to make examples of .C run "out of the box" then it helps to be working in a UNIX environment. I haven't been using one of those, and it is much harder to get things to work on Windows; I have a feeling that many people feel that using Windows is somehow evil and are reluctant to help those who use it, which is a pain if you are a data analyst who happens to have no other choice. Or possibly this is unfair, and people who use Windows are simply expected to be familiar with the command line.

I don't want to go into the details of getting .C to work on Windows, just in case you aren't using it. What I can say is that I know nothing about computers, but I did manage to do it, and so it can be done.

A newer alternative to using .C or .Call is the Rcpp package. Dirk Edelbuettel, one of the package authors, is very active on Stackexchange and is very likely to help you if you have questions about this package, or any other R/C interface. As recommended above by Roman Luštrik, this package is likely to be a very good option. I haven't used it myself as I have not yet been able to install it under Windows.

As for the final part of your question, asking whether you can write a for loop in a foreign language and then just use R code inside the loop, I am pretty sure, unfortuately, that there is no good way of doing this. It would be great if you could just skip over having to use R's for altogether, but I don't think it works this way. You can, however, use various R functions in the C code if you include the R.h header file. Again, it's hard to get this to work on Windows. In particular, you have to install a thing called Rtools. But once it's working, writing a small piece of C code is almost as easy as writing the corresponding R code.

Anyway, I hope some of these references are somewhat helpful. The best option to try first is to write the R code as efficiently as possible. Next, try Rcpp. If this doesn't work, or if you are doing something fairly minor, then I recommend .C. I am sure some experts will turn up with better advice, but I hope it's at least somewhat useful to have an answer from a non-programmer who has struggled with these issues.

这篇关于对于R中的循环和计算速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆