如何在R中构造易于扩展的蒙特卡洛模型 [英] How to structure an easily extensible Monte Carlo model in R

查看:116
本文介绍了如何在R中构造易于扩展的蒙特卡洛模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于一个有两个农场的公司,我有一个简单的模型,每个农场都种植两种农作物(苹果和梨).第一步只是将树的数量乘以每棵树上的果实数量.

I have a simple model for a company with two farms, growing two crops (apples and pears) on each farm. The first step is just to multiply the number of trees by the amount of fruit on each tree.

模拟每棵树上的水果数量(在农场和农作物中).

The amount of fruit on each tree is simulated (across farms and crops).

在R中建模时,我至少要面对三个决定:

I face at least three decisions when modeling this in R:

  • 如何构造变量
  • 如何模拟
  • 如何将模拟变量与非模拟变量相乘

即使添加其他作物和/或农场,我希望它也能正常工作-理想情况下,即使添加其他尺寸(例如作物品种(格兰尼·史密斯等).我也想按名称而不是索引号来指代农场和农作物.

I want it to work even if I add another crop and/or farm - and ideally even if I add another dimension, e.g. crop variety (Granny Smith etc). I also want to refer to farms and crops by name, not by an index number.

这是我想出的方法.它可以工作,但是很难添加另一个维度,并且代码很多.有没有更整洁的方法?

Here is the approach I have come up with. It works, but it is hard to add another dimension, and it is a lot of code. Is there a neater way?

构建变量:

farms <- c('Farm 1', 'Farm 2');
crops <- c('Pear', 'Apple');
params <- c('mean','sd');

numTrees <- array(0, dim=c(length(farms), length(crops)), dimnames=list(farms,crops));
fruitPerTree <- array(0, dim=c(length(farms), length(varieties), length(params)), 
                      dimnames=list(farms,varieties,params));

# input data e.g.
numTrees['Farm 1', 'Pear'] = 100
# and
fruitPerTree['Farm 1', 'Pear', 'mean'] = 50

模拟:

simNormal2D <- function(dataVar, numSims) {
#
# Generate possible outcomes for dataVar (e.g. fruitPerTree).
# It generates them for each value of the first two dimensions. 
#
# Args:
#   dataVar:    3-dimensional array, 
#               with 3rd dim being the normal params
#   numSims:    integer, e.g. 10000
#
# e.g. sims <- simNormal2D(fruitPerTree, 10000)
    #
# Returns:
#   a 3D array with 3rd dimension indexing the simulated results
#
dims <- dimnames(dataVar);

sims <- array(dim=c(length(dims[[1]]), length(dims[[2]]), 0), 
              dimnames=list(dims[[1]], dims[[2]], NULL));
    for(x in dims[[1]]) {
    for(y in dims[[2]]) {
        sim <- rnorm(numSims, dataVar[x, y, 'mean'], 
                              dataVar[x, y, 'sd'] );
        sims <- append(sims, sim);
    }
}
# R fills array from first arg columnwise, so dims are reversed
sims <- array(sims, c(numSims, length(dims[[2]]), 
              length(dims[[1]])), 
              dimnames=list(c(1:numSims), dims[[2]], dims[[1]]));
# reverse them back again
sims <- aperm(sims, c(3,2,1));
return(sims);
}
simFruitPerTree <- simNormal2D(fruitPerTree, numSims);

要乘以simFruitPerTreenumTrees,我发现我首先必须执行手动广播:

To multiply simFruitPerTree and numTrees, I find I first have to perform a manual broadcast:

simNumTrees <- array(numTrees, dim=c(length(dims[[1]]), length(dims[[2]]), numSims), 
                     dimnames=list(dims[[1]], dims[[2]], c(1:numSims)));
simTotalFruit <- simFruitPerTree * simNumTrees;

为了进行比较,在Python(我比R更了解)中,我可以通过使用元组为字典建立索引并结合两个字典理解来在几行中执行所有这些步骤,例如:

For comparison, in Python (which I know better than R), I can perform all these steps in a few lines by using tuples to index a dictionary, along with two dictionary comprehensions, e.g.:

fruit_per_tree = {}
fruit_per_tree[('Farm 1', 'Pear')]  = (50, 15) # normal params
sim_fruit_per_tree = {key: random.normal(*params, size=num_sims) 
                      for key, params in fruit_per_tree.items() }
sim_total_fruit = {key: sim_fruit_per_tree[key]*num_trees[key] for key in num_trees }

R中也有简单的方法吗?谢谢!

Is there an easy way in R too? Thanks!

推荐答案

这里是解决我的问题的一般方法.我从roland的方法开始,然后对其进行了更新,以便可以轻松更改分布,参数和尺寸.

Here is a general solution to my problem. I started with roland's approach, and updated it so that the distribution, parameters, and dimensions can all be changed easily.

distSim <- function(nSims, simName, distFn, dimList, paramList, constList) {
    #
    # Simulate from a distribution across all the dimensions.
    #
    # Args:
    #   nSims:     integer, e.g. 10000
    #   simName:   name of the output column, e.g. 'harvestPerTree'
    #   distFn:    distribution function, e.g. rnorm
    #   dimList:   list of dimensions, 
    #              e.g. list(farms=c('farm A', 'farm B'), crops=c('apple', 'pear', 'durian'))
    #   paramList: list of parameters, each of length = product(length(d) for d in dimList),
    #              to be passed to the distribution function,
    #              e.g. list(mean=c(10,20,30,5,10,15), sd=c(2,4,6,1,2,3))
    #   constList: optional vector of length = product(length(d) for d in dimList)
    #              these are included in the output dataframe
    #              e.g. list(nTrees=c(10,20,30,1,2,3))
    #
    # Returns:
    #   a dataframe with one row per iteration x product(d in dimList)
    #

    # expand out the dimensions into a dataframe grid - one row per combination
    df <- do.call(expand.grid, dimList);
    nRows <- nrow(df);
    # add the parameters, and constants, if present
    df <- cbind(df, paramList);
    if (!missing(constList)) {
        df <- cbind(df, constList);
    }
    # repeat this dataframe for each iteration of the simulation
    df <- do.call("rbind",replicate(nSims, df, simplify=FALSE));
    # add a new column giving the iteration number ('simId')
    simId <- sort(rep(seq(1:nSims),nRows));
    df <- cbind(simId, df);
    # simulate from the distribution
    df[simName] <- do.call(distFn, c(list(n=nrow(df)), df[names(paramList)]))
    return(df);
}

示例用法(仅出于简单起见使用正态分布):

Sample usage (using a normal distribution for simplicity only):

dimList <- list(farms=c('farm A', 'farm B'), crops=c('apple', 'pear', 'durian'));
constList <- list(numTrees=c(10,20,30,1,2,3));
paramList <- list(mean=c(10,20,30,5,10,15), sd=c(2,4,6,1,2,3));
distSim(nSims=3, simName='harvestPerTree', distFn=rnorm, dimList=dimList, 
        paramList=paramList, constList=constList);

示例输出:

   simId  farms  crops mean sd numTrees harvestPerTree
1      1 farm A  apple   10  2       10       9.602849
2      1 farm B  apple   20  4       20      20.153225
3      1 farm A   pear   30  6       30      25.839825
4      1 farm B   pear    5  1        1       1.733120
5      1 farm A durian   10  2        2      13.506135
6      1 farm B durian   15  3        3      11.690910
7      2 farm A  apple   10  2       10       7.678611
8      2 farm B  apple   20  4       20      22.119560
9      2 farm A   pear   30  6       30      31.488002
10     2 farm B   pear    5  1        1       5.366725
11     2 farm A durian   10  2        2       6.333747
12     2 farm B durian   15  3        3      13.918085
13     3 farm A  apple   10  2       10      10.387194
14     3 farm B  apple   20  4       20      21.086388
15     3 farm A   pear   30  6       30      34.076926
16     3 farm B   pear    5  1        1       6.159415
17     3 farm A durian   10  2        2       8.322902
18     3 farm B durian   15  3        3       9.458085

还请注意,您还可以用一种很好的索引方式来定义输入值.例如如果您定义

Note also that you can also define the input values in a nicely indexed way; e.g. if you define

numTrees2 <- array(0, dim=c(length(farms), length(crops)), dimnames=tree_dimList);
numTrees2['Farm A','apple'] = 200; 
# etc

然后c(numTrees)对其输出进行排序的方式与expand.grid相匹配,因此您可以将constList = list(numTrees=c(numTrees2)传递给distSim.

Then the way that c(numTrees) orders its outputs matches expand.grid, so you can pass constList = list(numTrees=c(numTrees2) to distSim.

这篇关于如何在R中构造易于扩展的蒙特卡洛模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆