如何避免R中具有多个变量的多个循环 [英] How to avoid multiple loops with multiple variables in R

查看:169
本文介绍了如何避免R中具有多个变量的多个循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个存储在表中的数据集,一个是一组[a, b],另一个是[x, Sx, y, Sy, rho].我有一个需要(a, b, x, Sx, y, Sy, rho)的概率函数f.最后,我想找到第一个[a, b]的所有[x, Sx, y, Sy, rho]概率结果的总和.然后找到第二个[a, b]中所有[x, Sx, y, Sy, rho]的总和,依此类推...

I have a two datasets stored in tables, one is a set of [a, b] and another is [x, Sx, y, Sy, rho]. I have a probability function f that requires (a, b, x, Sx, y, Sy, rho). In the end I want to find the sum of the probability results over all [x, Sx, y, Sy, rho] for the first [a, b]. Then find the sum for all [x, Sx, y, Sy, rho] over the second [a, b], etc...

我想在[x, Sx, y, Sy, rho]文件中有几百行,在[a, b]文件中有几十万行.

I would like to have a few hundred rows in the [x, Sx, y, Sy, rho] file and a few hundred thousand rows in the [a, b] file.

我想知道是否有一种方法可以在不使用两个循环的情况下进行?我已经尝试了以下方法,但它并没有达到我想要的效果,但是我知道它太慢了.

I'm wondering if there is a way to do this without using two loops? I've tried the following, and it doesn't quite work the way I want it to, but I know it will be far too slow.

我不知道是否有帮助,但是我已在代码中添加了该功能.抱歉,该函数本身是一团糟,并且格式不正确.

I don't know if it will help but I've added the function in the code. Sorry that the function itself is a mess and not formatted properly.

# data  file with (a, b)
data            <- matrix( c(1, 0, 1, 1, 0.5, 0), nrow=3, ncol=2) 
colnames(data)  <- c("a", "b") 
Ndat            <- dim(data)
Ndata           <- Ndat[1]

# data2 file with (x, Sx, y, Sy, rho)
data2           <- matrix( c(1, 0.1, 1, 0.1, 0.002, 2, 0.1, 2, 0.1, 0.000001, 
                             2, 0.1, 1, 0.1, 0.002), nrow=3, ncol=5) 
colnames(data2) <- c("x", "Sx", "y", "Sy", "rho") 
Ndat2           <- dim(data)
Ndata2          <- Ndat[1]

# function requires variables (a, b, s, Sx, y, Sy, rho) 
Prob  <- function(a, b, Xi, sX, Yi, sY, rho) {sqrt(1 + a ^ 2) * (
  exp(-((b + a * Xi - Yi) ^ 2 / (
    2 * ((a ^ 2 * sX ^ 2) - 
         (2 * a * rho * sX * sY) + sY ^ 2)))) * sqrt((
           1 - rho ^ 2) / (
             a ^ 2 * sX ^ 2 - 2 * a * rho *sX *sY + sY ^ 2))/(
               sqrt(2 * pi) * sqrt(1 - rho ^ 2)))
    }

# Here is my weak attempt
Table <- NULL
Table <- for (j in 1:Ndata) { 
   sum (for (i in 1:Ndata2) {
   Datatable[i] = Prob(data[j, a], data[j, b], data2[i, x], 
                 data2[i, Sx], data2[i, y], data2[i, Sy], 
                 data2[i, rho])
   })
}

我很难把apply函数包起来,以及何时/应该使用它们.我知道我可能没有添加足够的信息,因此任何可以帮助我的建议都将是很棒的.我对R和编程都很陌生,因此请原谅任何不适当的词汇或格式.

I am having a very hard time wrapping my head around the apply functions and when they can/should be used. I know that I've probably not added enough information, so any suggestions that can help me out would be great. I'm pretty new to programming as well as R, so please forgive any inappropriate vocabulary or formatting.

可能有更好的方法来定义data中的数字或行以将Ndata设置为全局变量,但这是我偶然发现的第一个.

There is probably a better way to define the number or rows in data to get Ndata as a global, but these are the first I stumbled across.

该函数不应该是递归的,但是我现在看到它就像我写的那样.我花了很多时间在R的入门教程上,但仍然很难理解apply函数集的最佳实现方式.

The function should not be recursive, but I see now that it is as I've written it. I have spent many hours on intro tutorials to R and still am having a very hard time understanding how the apply suite of functions are best implemented.

我希望一次迭代,以使用data第一行中的a, b将此功能应用于data2中的每一行.然后sum所有这些的概率.然后,下一次迭代应该使用应用于data2

I would like one iteration to apply this function to each row in data2 using a, b from the first row of data. Then sum the probability for all of those. Then the next iteration should sum all of the probabilities for row 2 of data using a, b applied to every row of data2

推荐答案

我觉得有一种更简单的方法可以执行此操作,但是类似的方法可能会起作用.

I have a feeling there's an easier way to do this, but something like this will probably work.

f <- function(a,b,x,y,z) a+b+x+y+z
f.new <- function(p1,p2) {
  p1=as.list(p1); p2=as.list(p2)
  f(p1$a,p1$b,p2$x,p2$y,p2$z)
}

data1 <- data.frame(a=1:10,b=11:20)
data2 <- data.frame(x=1:5,y=21:25,z=31:35)
indx  <- expand.grid(indx2=seq(nrow(data2)),indx1=seq(nrow(data1)))
result <- with(indx,f.new(data1[indx1,],data2[indx2,]))
sums   <- aggregate(result,by=list(rep(seq(nrow(data1)),each=nrow(data2))),sum)

您似乎想为两个变量集(a,b)集和(x, Sx, y, Sy, rho)集的每种组合评估一个函数,然后对每个第一组的实例.

You seem to want to evaluate a function for every combination of two sets of variables, the set of (a,b) and the set of (x, Sx, y, Sy, rho), then sum over the second set, for every instance of the first set.

因此,这首先将函数f(...)重新定义为采用两个参数,分别代表两个集合.这是f.new(...).您应该以这种方式定义原始功能-它会运行得更快.

So first this redefines the function f(...) to take two arguments, representing the two sets. This is f.new(...). You should probably define your original function that way - it will run faster.

然后我们创建一个具有两列的数据框indx,表示data1data2中行号的每种组合,然后我们使用data1data2进行索引,并使用indx.生成的result具有在(a,b)(x,y,z)的每种组合下评估的功能.然后,我们将其汇总以获得您指定的总和.

Then we create a data frame, indx that has two columns, representing every combination of the row numbers in data1 and data2, then we call f.new(...) using data1 and data2 indexed using indx. This produced result which has the function evaluated at every combination of (a,b) and (x,y,z). Then we aggregate that to get the sums you specified.

这种方法占用大量内存; result将具有约10mm的元素,但运行速度将比循环快.

This approach is memory intensive; result will have ~ 10MM elements, but will run faster than loops.

这篇关于如何避免R中具有多个变量的多个循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆