R-具有预定义的最小值,最大值,平均值和sd值的随机分布 [英] R - random distribution with predefined min, max, mean, and sd values

查看:258
本文介绍了R-具有预定义的最小值,最大值,平均值和sd值的随机分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成一个具有预定义的最小值,最大值,平均值和sd值的10,000个数字的随机分布.我已按照以下链接在rmrm中设置上限和下限以获取固定最小值和最大值的随机分布.但是,这样做会改变平均值.

I want to generate a random distribution of say 10,000 numbers with predefined min, max, mean, and sd values. I have followed this link setting upper and lower limits in rnorm to get random distribution with fixed min and max values. However, in doing so, mean value changes.

例如,

#Function to generate values between a lower limit and an upper limit.
mysamp <- function(n, m, s, lwr, upr, nnorm) {
set.seed(1)
samp <- rnorm(nnorm, m, s)
samp <- samp[samp >= lwr & samp <= upr]
if (length(samp) >= n) {
return(sample(samp, n))
}  
stop(simpleError("Not enough values to sample from. Try increasing nnorm."))
} 
Account_Value <- mysamp(n=10000, m=1250000, s=4500000, lwr=50000, upr=5000000, nnorm=1000000)
summary(Account_Value)

# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 50060 1231000 2334000 2410000 3582000 5000000
#Note - though min and max values are good, mean value is very skewed for an obvious reason.
# sd(Account_Value) # 1397349

我不确定我们是否可以生成满足所有条件的随机正态分布.如果还有其他任何一种可以满足所有条件的随机分布,也请共享.

I am not sure whether we can generate a random normal distribution that meets all conditions. If there is any other sort of random distribution that can meet all conditions, please do share too.

期待您的输入.

-谢谢.

推荐答案

讨论:

嗨.这是一个非常有趣的问题.要正确解决它需要付出很大的努力,而并非总是能找到解决方案.

Discussion:

Hi. It is very interesting problem. It needs quite an effort to be solved properly and not always solution can be found.

第一件事是,在截断分布时(为其设置最小值和最大值),标准偏差是有限的(最大值取决于最小值和最大值).如果您想要太大的价值-您将无法获得它.

First thing is that when you truncate a distribution (set a min and max for it) standard deviation is limited (has a maximum depending on min and max values). If you want too big value of it - you can not get it.

第二个限制极限平均值.显然,如果您希望均值低于最小值并高于最大值,则将不起作用,但是您可能想要太接近极限值而仍然无法满足要求.

Second restriction limits mean. It is obvious that if you want mean below minimum and above maximum it will not work, but you may want something too close to limits and still it can not be satisfied.

第三限制限制了此参数的组合.我不知道它是如何工作的,但是我很确定不是所有的组合都可以满足.

Third restriction limits a combination of this parameters. Im not sure how does it work, but i am pretty sure not all the combinations may be satisfied.

但是有一些组合可能有效并可能找到.

But there are some combinations that may work and may be found.

问题是:什么是参数:具有定义的界限ab的截断(切割)分布的meansd,因此最终的均值将等于desired_mean,并且标准偏差将等于desired_sd.

The problem is: what are the parameters: mean and sd of truncated (cut) distribution with defined limits a and b, so in the end the mean will be equal to desired_mean and standard deviation will be equal to desired_sd.

重要的是,在截断之前使用参数meansd的值.因此,这就是为什么均值和偏差最终不同的原因.

It is important that values of parameters: mean and sd are used before truncation. So that is why in the end mean and deviation are diffrent.

下面是使用功能optim()解决问题的代码.它可能不是解决此问题的最佳方法,但通常可以起作用:

Below is the code that solves the problem using function optim(). It may not be the best solution for this problem, but it generally works:

require(truncnorm)

eval_function <- function(mean_sd){
    mean <- mean_sd[1]
    sd <- mean_sd[2]
    sample <- rtruncnorm(n = n, a = a, b = b, mean = mean, sd = sd)
    mean_diff <-abs((desired_mean - mean(sample))/desired_mean)
    sd_diff <- abs((desired_sd - sd(sample))/desired_sd)
    mean_diff + sd_diff
}

n = 1000
a <- 1
b <- 6
desired_mean <- 3
desired_sd <- 1

set.seed(1)
o <- optim(c(desired_mean, desired_sd), eval_function)

new_n <- 10000
your_sample <- rtruncnorm(n = new_n, a = a, b = b, mean = o$par[1], sd = o$par[2])
mean(your_sample)
sd(your_sample)
min(your_sample)
max(your_sample)
eval_function(c(o$par[1], o$par[2]))

如果对此问题有其他解决方案,我非常感兴趣,因此,如果您找到其他答案,请发布它们.

I am very interested if there are other solutions to that problem, so please post them if you find other answers.

@Mikko Marttila:感谢您的评论和链接: Wikipedia 我实现了计算公式均值和标准差截断分布.现在,解决方案更加完善,并且如果存在期望的分布,则应该可以相当准确地计算出期望分布的均值和sd.它的运行速度也快得多.

@Mikko Marttila: Thanks to your comment and link: Wikipedia I implemented formulas to calculate mean and sd of truncated distribution. Now the solution is WAY more elegant and it should calculate quite accurately mean and sd of the desired distribution if they exist. It works much faster also.

我实现了eval_function2,应该在optim()函数中使用它,而不是前一个:

I implemented eval_function2 which should be used in the optim() function instead of previous one:

eval_function2 <- function(mean_sd){
    mean <- mean_sd[1]
    sd <- mean_sd[2]

    alpha <- (a - mean)/sd
    betta <- (b - mean)/sd

    trunc_mean <- mean + sd * (dnorm(alpha, 0, 1) - dnorm(betta, 0, 1)) / 
                  (pnorm(betta, 0, 1) - pnorm(alpha, 0, 1))

    trunc_var <- (sd ^ 2) * 
                 (1 + 
                  (alpha * dnorm(alpha, 0, 1) - betta * dnorm(betta, 0, 1))/
                  (pnorm(betta, 0, 1) - pnorm(alpha, 0, 1)) -
                 (dnorm(alpha, 0, 1) - dnorm(betta, 0, 1))/
                 (pnorm(betta, 0, 1) - pnorm(alpha, 0, 1)))

    trunc_sd <- trunc_var ^ 0.5

    mean_diff <-abs((desired_mean - trunc_mean)/desired_mean)
    sd_diff <- abs((desired_sd - trunc_sd)/desired_sd)
}

这篇关于R-具有预定义的最小值,最大值,平均值和sd值的随机分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆