如何在R中构造易于扩展的蒙特卡洛模型 [英] How to structure an easily extensible Monte Carlo model in R
问题描述
对于一个有两个农场的公司,我有一个简单的模型,每个农场都种植两种农作物(苹果和梨).第一步只是将树的数量乘以每棵树上的果实数量.
I have a simple model for a company with two farms, growing two crops (apples and pears) on each farm. The first step is just to multiply the number of trees by the amount of fruit on each tree.
模拟每棵树上的水果数量(在农场和农作物中).
The amount of fruit on each tree is simulated (across farms and crops).
在R中建模时,我至少要面对三个决定:
I face at least three decisions when modeling this in R:
- 如何构造变量
- 如何模拟
- 如何将模拟变量与非模拟变量相乘
即使添加其他作物和/或农场,我希望它也能正常工作-理想情况下,即使添加其他尺寸(例如作物品种(格兰尼·史密斯等).我也想按名称而不是索引号来指代农场和农作物.
I want it to work even if I add another crop and/or farm - and ideally even if I add another dimension, e.g. crop variety (Granny Smith etc). I also want to refer to farms and crops by name, not by an index number.
这是我想出的方法.它可以工作,但是很难添加另一个维度,并且代码很多.有没有更整洁的方法?
Here is the approach I have come up with. It works, but it is hard to add another dimension, and it is a lot of code. Is there a neater way?
构建变量:
farms <- c('Farm 1', 'Farm 2');
crops <- c('Pear', 'Apple');
params <- c('mean','sd');
numTrees <- array(0, dim=c(length(farms), length(crops)), dimnames=list(farms,crops));
fruitPerTree <- array(0, dim=c(length(farms), length(varieties), length(params)),
dimnames=list(farms,varieties,params));
# input data e.g.
numTrees['Farm 1', 'Pear'] = 100
# and
fruitPerTree['Farm 1', 'Pear', 'mean'] = 50
模拟:
simNormal2D <- function(dataVar, numSims) {
#
# Generate possible outcomes for dataVar (e.g. fruitPerTree).
# It generates them for each value of the first two dimensions.
#
# Args:
# dataVar: 3-dimensional array,
# with 3rd dim being the normal params
# numSims: integer, e.g. 10000
#
# e.g. sims <- simNormal2D(fruitPerTree, 10000)
#
# Returns:
# a 3D array with 3rd dimension indexing the simulated results
#
dims <- dimnames(dataVar);
sims <- array(dim=c(length(dims[[1]]), length(dims[[2]]), 0),
dimnames=list(dims[[1]], dims[[2]], NULL));
for(x in dims[[1]]) {
for(y in dims[[2]]) {
sim <- rnorm(numSims, dataVar[x, y, 'mean'],
dataVar[x, y, 'sd'] );
sims <- append(sims, sim);
}
}
# R fills array from first arg columnwise, so dims are reversed
sims <- array(sims, c(numSims, length(dims[[2]]),
length(dims[[1]])),
dimnames=list(c(1:numSims), dims[[2]], dims[[1]]));
# reverse them back again
sims <- aperm(sims, c(3,2,1));
return(sims);
}
simFruitPerTree <- simNormal2D(fruitPerTree, numSims);
要乘以simFruitPerTree
和numTrees
,我发现我首先必须执行手动广播:
To multiply simFruitPerTree
and numTrees
, I find I first have to perform a manual broadcast:
simNumTrees <- array(numTrees, dim=c(length(dims[[1]]), length(dims[[2]]), numSims),
dimnames=list(dims[[1]], dims[[2]], c(1:numSims)));
simTotalFruit <- simFruitPerTree * simNumTrees;
为了进行比较,在Python(我比R更了解)中,我可以通过使用元组为字典建立索引并结合两个字典理解来在几行中执行所有这些步骤,例如:
For comparison, in Python (which I know better than R), I can perform all these steps in a few lines by using tuples to index a dictionary, along with two dictionary comprehensions, e.g.:
fruit_per_tree = {}
fruit_per_tree[('Farm 1', 'Pear')] = (50, 15) # normal params
sim_fruit_per_tree = {key: random.normal(*params, size=num_sims)
for key, params in fruit_per_tree.items() }
sim_total_fruit = {key: sim_fruit_per_tree[key]*num_trees[key] for key in num_trees }
R中也有简单的方法吗?谢谢!
Is there an easy way in R too? Thanks!
推荐答案
这里是解决我的问题的一般方法.我从roland的方法开始,然后对其进行了更新,以便可以轻松更改分布,参数和尺寸.
Here is a general solution to my problem. I started with roland's approach, and updated it so that the distribution, parameters, and dimensions can all be changed easily.
distSim <- function(nSims, simName, distFn, dimList, paramList, constList) {
#
# Simulate from a distribution across all the dimensions.
#
# Args:
# nSims: integer, e.g. 10000
# simName: name of the output column, e.g. 'harvestPerTree'
# distFn: distribution function, e.g. rnorm
# dimList: list of dimensions,
# e.g. list(farms=c('farm A', 'farm B'), crops=c('apple', 'pear', 'durian'))
# paramList: list of parameters, each of length = product(length(d) for d in dimList),
# to be passed to the distribution function,
# e.g. list(mean=c(10,20,30,5,10,15), sd=c(2,4,6,1,2,3))
# constList: optional vector of length = product(length(d) for d in dimList)
# these are included in the output dataframe
# e.g. list(nTrees=c(10,20,30,1,2,3))
#
# Returns:
# a dataframe with one row per iteration x product(d in dimList)
#
# expand out the dimensions into a dataframe grid - one row per combination
df <- do.call(expand.grid, dimList);
nRows <- nrow(df);
# add the parameters, and constants, if present
df <- cbind(df, paramList);
if (!missing(constList)) {
df <- cbind(df, constList);
}
# repeat this dataframe for each iteration of the simulation
df <- do.call("rbind",replicate(nSims, df, simplify=FALSE));
# add a new column giving the iteration number ('simId')
simId <- sort(rep(seq(1:nSims),nRows));
df <- cbind(simId, df);
# simulate from the distribution
df[simName] <- do.call(distFn, c(list(n=nrow(df)), df[names(paramList)]))
return(df);
}
示例用法(仅出于简单起见使用正态分布):
Sample usage (using a normal distribution for simplicity only):
dimList <- list(farms=c('farm A', 'farm B'), crops=c('apple', 'pear', 'durian'));
constList <- list(numTrees=c(10,20,30,1,2,3));
paramList <- list(mean=c(10,20,30,5,10,15), sd=c(2,4,6,1,2,3));
distSim(nSims=3, simName='harvestPerTree', distFn=rnorm, dimList=dimList,
paramList=paramList, constList=constList);
示例输出:
simId farms crops mean sd numTrees harvestPerTree
1 1 farm A apple 10 2 10 9.602849
2 1 farm B apple 20 4 20 20.153225
3 1 farm A pear 30 6 30 25.839825
4 1 farm B pear 5 1 1 1.733120
5 1 farm A durian 10 2 2 13.506135
6 1 farm B durian 15 3 3 11.690910
7 2 farm A apple 10 2 10 7.678611
8 2 farm B apple 20 4 20 22.119560
9 2 farm A pear 30 6 30 31.488002
10 2 farm B pear 5 1 1 5.366725
11 2 farm A durian 10 2 2 6.333747
12 2 farm B durian 15 3 3 13.918085
13 3 farm A apple 10 2 10 10.387194
14 3 farm B apple 20 4 20 21.086388
15 3 farm A pear 30 6 30 34.076926
16 3 farm B pear 5 1 1 6.159415
17 3 farm A durian 10 2 2 8.322902
18 3 farm B durian 15 3 3 9.458085
还请注意,您还可以用一种很好的索引方式来定义输入值.例如如果您定义
Note also that you can also define the input values in a nicely indexed way; e.g. if you define
numTrees2 <- array(0, dim=c(length(farms), length(crops)), dimnames=tree_dimList);
numTrees2['Farm A','apple'] = 200;
# etc
然后c(numTrees)
对其输出进行排序的方式与expand.grid
相匹配,因此您可以将constList = list(numTrees=c(numTrees2)
传递给distSim
.
Then the way that c(numTrees)
orders its outputs matches expand.grid
, so you can pass constList = list(numTrees=c(numTrees2)
to distSim
.
这篇关于如何在R中构造易于扩展的蒙特卡洛模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!