用 R 思考向量 [英] Thinking in Vectors with R

查看:37
本文介绍了用 R 思考向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 R 对向量最有效,应该避免循环.我很难自学以这种方式实际编写代码.我想要一些关于如何矢量化"我的代码的想法.这是为状态 (st)、plan1 (p1) 和 plan2 (p2) 的 10,000 个非唯一组合创建 10 年样本数据的示例):

I know that R works most efficiently with vectors and looping should be avoided. I am having a hard time teaching myself to actually write code this way. I would like some ideas on how to 'vectorize' my code. Here's an example of creating 10 years of sample data for 10,000 non unique combinations of state (st), plan1 (p1) and plan2 (p2):

st<-NULL
p1<-NULL
p2<-NULL
year<-NULL
i<-0
starttime <- Sys.time()

while (i<10000) {
    for (years in seq(1991,2000)) {
        st<-c(st,sample(c(12,17,24),1,prob=c(20,30,50)))
        p1<-c(p1,sample(c(12,17,24),1,prob=c(20,30,50)))
        p2<-c(p2,sample(c(12,17,24),1,prob=c(20,30,50)))    
        year <-c(year,years)
    }
        i<-i+1
}
Sys.time() - starttime

这在我的笔记本电脑上运行大约需要 8 分钟.正如预期的那样,我最终得到 4 个向量,每个向量都有 100,000 个值.如何使用向量函数更快地完成此操作?

This takes about 8 minutes to run on my laptop. I end up with 4 vectors, each with 100,000 values, as expected. How can I do this faster using vector functions?

作为旁注,如果我将上面的代码限制为 1000 次循环,我只需要 2 秒,但 10,000 需要 8 分钟.知道为什么吗?

As a side note, if I limit the above code to 1000 loops on i it only takes 2 seconds, but 10,000 takes 8 minutes. Any idea why?

推荐答案

很明显,我应该在发布问题之前再研究一个小时.回想起来太明显了.:)

Clearly I should have worked on this for another hour before I posted my question. It's so obvious in retrospect. :)

为了使用 R 的向量逻辑,我取出循环并将其替换为:

To use R's vector logic I took out the loop and replaced it with this:

st <-   sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
p1 <-   sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
p2 <-   sample(c(12,17,24),10000,prob=c(20,30,50),replace=TRUE)
year <- rep(1991:2000,1000)

我现在几乎可以瞬间完成 100,000 个样本.我知道向量更快,但该死.我认为 100,000 次循环使用循环需要一个多小时,而矢量方法需要 <1 秒.只是为了好玩,我将向量制作为一百万.大约需要 2 秒才能完成.因为我必须测试失败,所以我尝试了 10mm 但我的 2GB 笔记本电脑内存不足.我切换到带有 6GB 内存的 Vista 64 桌面,并在 17 秒内创建了长度为 10 毫米的矢量.100 毫米使事情分崩离析,因为其中一个向量超过 763 mb,导致 R 出现分配问题.

I can now do 100,000 samples almost instantaneous. I knew that vectors were faster, but dang. I presume 100,000 loops would have taken over an hour using a loop and the vector approach takes <1 second. Just for kicks I made the vectors a million. It took ~2 seconds to complete. Since I must test to failure, I tried 10mm but ran out of memory on my 2GB laptop. I switched over to my Vista 64 desktop with 6GB ram and created vectors of length 10mm in 17 seconds. 100mm made things fall apart as one of the vectors was over 763mb which resulted in an allocation issue with R.

R 中的向量对我来说非常快.我想这就是为什么我是经济学家而不是计算机科学家的原因.

Vectors in R are amazingly fast to me. I guess that's why I am an economist and not a computer scientist.

这篇关于用 R 思考向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆