用于相关研究的有效数发生器 [英] efficient number generator for correlation studies

查看:93
本文介绍了用于相关研究的有效数发生器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是在最小和最大范围内生成7个与 Pearson相关系数大于0.95.我已经成功地使用了3个数字(显然是因为这对计算的要求不是很高).但是,对于4个数字,所需的计算似乎非常大(即大约1万次迭代).在当前代码中,几乎不可能使用7个数字.

当前代码:

def pearson_def(x, y):
    assert len(x) == len(y)
    n = len(x)
    assert n > 0
    avg_x = average(x)
    avg_y = average(y)
    diffprod = 0
    xdiff2 = 0
    ydiff2 = 0
    for idx in range(n):
        xdiff = x[idx] - avg_x
        ydiff = y[idx] - avg_y
        diffprod += xdiff * ydiff
        xdiff2 += xdiff * xdiff
        ydiff2 += ydiff * ydiff

    return diffprod / math.sqrt(xdiff2 * ydiff2)

c1_high = 98
c1_low = 75

def corr_gen():
    container =[]
    x=0
    while True:
        c1 = c1_low
        c2 = np.random.uniform(c1_low, c1_high)
        c3 = c1_high
        container.append(c1)
        container.append(c2)
        container.append(c3)
        y = np.arange(len(container))

        if pearson_def(container,y) >0.95:
            c4 = np.random.uniform(c1_low, c1_high)
            container.append(c4)
            y = np.arange(len(container))
            if pearson_def(container,y) >0.95:
                return container
            else:
                continue
        else:
            x+=1
            print(x)
            continue

corrcheck = corr_gen()
print(corrcheck)

最终目标:

*要具有4个具有线性分布的列(具有均匀分布的点)

*每行对应一组项目(C1,C2,C3,C4),它们的总和必须等于100.

       C1      C2    C3    C4   sum   range 
 1     70      10    5     1    100    ^
 2     ..                              |  
 3     ..                              |
 4     ..                              | 
 5     ..                              |
 6     ..                              |
 7     90      20    15    3           _

示例传播涉及两个理论组成部分:

解决方案

您可以使用Pearson correlation coefficient of greater than 0.95. I have been successful with 3 numbers (obviously because this isn't very computationally demanding).. however for 4 numbers, the computation required seems very large (i.e. on the order of 10k iterations). 7 numbers would be almost impossible with the current code.

Current code:

def pearson_def(x, y):
    assert len(x) == len(y)
    n = len(x)
    assert n > 0
    avg_x = average(x)
    avg_y = average(y)
    diffprod = 0
    xdiff2 = 0
    ydiff2 = 0
    for idx in range(n):
        xdiff = x[idx] - avg_x
        ydiff = y[idx] - avg_y
        diffprod += xdiff * ydiff
        xdiff2 += xdiff * xdiff
        ydiff2 += ydiff * ydiff

    return diffprod / math.sqrt(xdiff2 * ydiff2)

c1_high = 98
c1_low = 75

def corr_gen():
    container =[]
    x=0
    while True:
        c1 = c1_low
        c2 = np.random.uniform(c1_low, c1_high)
        c3 = c1_high
        container.append(c1)
        container.append(c2)
        container.append(c3)
        y = np.arange(len(container))

        if pearson_def(container,y) >0.95:
            c4 = np.random.uniform(c1_low, c1_high)
            container.append(c4)
            y = np.arange(len(container))
            if pearson_def(container,y) >0.95:
                return container
            else:
                continue
        else:
            x+=1
            print(x)
            continue

corrcheck = corr_gen()
print(corrcheck)

Final objective:

*To have 4 columns with a linear distribution (with evenly spaced points)

*Each row corresponds to a group of items (C1,C2,C3,C4) and their sum must equal to 100.

       C1      C2    C3    C4   sum   range 
 1     70      10    5     1    100    ^
 2     ..                              |  
 3     ..                              |
 4     ..                              | 
 5     ..                              |
 6     ..                              |
 7     90      20    15    3           _

Example spread for two theoretical components:

解决方案

You can use np.random.multivariate_normal as follows:

import numpy as np

_corr = 0.95
n = 2
size = 7

corr = np.full((n, n), _corr)
np.fill_diagonal(corr, 1.)  # inplace op

# Change as you see fit; you can scale distr. later too
mu, sigma = 0., 1.
mu = np.repeat(mu, n)
sigma = np.repeat(sigma, n)

def corr2cov(corr, s):
    d = np.diag(s)
    return d.dot(corr).dot(d)

cov = corr2cov(corr, sigma)

# While we specified parameters, our draws are still psuedorandom.
# Loop till we meet the specified threshold for correl.
res = 0.
while res < _corr:
    dist = np.random.multivariate_normal(mean=mu, cov=cov, size=size)
    res = np.corrcoef(dist[:, 0], dist[:, 1])[0, 1]

The result you're interested in is dist, in this case returned as a 2d array with 2 features and 7 samples each.

Walkthrough:

  • Create a correlation matrix with your specified correlation.
  • Specify a mean and standard deviation, ~N(0, 1) in this case, which you can scale later if wanted.
  • Convert the correlation to covariance using the standard deviation. (They are the same in this particular case).
  • Draw random samples from a multivariate normal distribution.

这篇关于用于相关研究的有效数发生器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆