生成与现有1D数组具有预先指定的相关性的NumPy 1D数组? [英] Generate a NumPy 1D array with a pre-specified correlation with an existing 1D array?

查看:59
本文介绍了生成与现有1D数组具有预先指定的相关性的NumPy 1D数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个未生成的1D NumPy数组.现在,我们将使用生成的一个.

I have a non-generated 1D NumPy array. For now, we will use a generated one.

import numpy as np

arr1 = np.random.uniform(0, 100, 1_000)

我需要一个与 0.3 相关的数组:

I need an array that will be correlated 0.3 with it:

arr2 = '?'
print(np.corrcoef(arr1, arr2))

Out[1]: 0.3

推荐答案

我已经通过用笨拙的方式解决了这个答案stats.SE到NumPy.想法是随机生成第二个数组 noise ,然后在 arr1 上计算 noise 的最小二乘线性回归的残差.残差与 arr1 的相关性必定为0,当然 arr1 与自身的相关性为1,因此 a * arr1 +的适当线性组合b *残基将具有任何所需的相关性.

I've adapted this answer by whuber on stats.SE to NumPy. The idea is to generate a second array noise randomly, and then compute the residuals of a least-squares linear regression of noise on arr1. The residuals necessarily have a correlation of 0 with arr1, and of course arr1 has a correlation of 1 with itself, so an appropriate linear combination of a*arr1 + b*residuals will have any desired correlation.

import numpy as np

def generate_with_corrcoef(arr1, p):
    n = len(arr1)

    # generate noise
    noise = np.random.uniform(0, 1, n)

    # least squares linear regression for noise = m*arr1 + c
    m, c = np.linalg.lstsq(np.vstack([arr1, np.ones(n)]).T, noise)[0]

    # residuals have 0 correlation with arr1
    residuals = noise - (m*arr1 + c)

    # the right linear combination a*arr1 + b*residuals
    a = p * np.std(residuals)
    b = (1 - p**2)**0.5 * np.std(arr1)

    arr2 = a*arr1 + b*residuals

    # return a scaled/shifted result to have the same mean/sd as arr1
    # this doesn't change the correlation coefficient
    return np.mean(arr1) + (arr2 - np.mean(arr2)) * np.std(arr1) / np.std(arr2)

最后一行缩放结果,以使平均值和标准偏差与 arr1 相同.但是, arr1 arr2 不会完全相同地分布.

The last line scales the result so that the mean and standard deviation are the same as arr1's. However, arr1 and arr2 will not be identically distributed.

用法:

>>> arr1 = np.random.uniform(0, 100, 1000)
>>> arr2 = generate_with_corrcoef(arr1, 0.3)
>>> np.corrcoef(arr1, arr2)
array([[1. , 0.3],
       [0.3, 1. ]])

这篇关于生成与现有1D数组具有预先指定的相关性的NumPy 1D数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆