如何使用python分隔两条高斯曲线? [英] How to use python to separate two gaussian curves?

查看:90
本文介绍了如何使用python分隔两条高斯曲线?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我测量了数千个粒子的荧光强度,并绘制了直方图,该直方图显示了两个相邻的高斯曲线.如何使用python或其软件包将它们分成两条高斯曲线并绘制两个新图?

谢谢.

解决方案

基本上,您需要推断高斯混合的参数.我将为插图生成一个类似的数据集.

生成参数已知的混合物

从itertools

 导入星图将numpy导入为np将seaborn导入为sns导入matplotlib.pyplot作为plt从matplotlib导入mlabsns.set(color_codes = True)#Jupyter Notebook中的内联图%matplotlib内联#从两个权重相等的高斯混合而成的综合数据#下面的解决方案很容易推广到更多组件nsamples = 10000平均值= [30,120]sds = [10,50]权重= [0.5,0.5]绘制= np.random.multinomial(nsamples,权重)样本= np.concatenate(列表(starmap(np.random.normal,zip(均值,sds,绘制)))) 

绘制分布图

  sns.distplot(样本) 

推断参数

从sklearn.mixture

 导入GaussianMixture混合=高斯混合(n_components = 2).fit(samples.reshape(-1,1))means_hat = blend.means_.flatten()weights_hat = blend.weights_.flatten()sds_hat = np.sqrt(mixture.covariances _).flatten()打印(mixture.converged_)打印(means_hat)打印(sds_hat)打印(weights_hat) 

我们得到:

  True[122.57524745 29.97741112][48.18013893 10.44561398][0.48559771 0.51440229] 

您可以调整GaussianMixture的超参数以提高拟合度,但这看起来还不错.现在我们可以绘制每个组件(我只绘制第一个组件):

  mu1_h,sd1_h = Means_hat [0],sds_hat [0]x_axis = np.linspace(mu1_h-3 * sd1_h,mu1_h + 3 * sd1_h,1000)plt.plot(x_axis,mlab.normpdf(x_axis,mu1_h,sd1_h)) 

P.S.

在旁注中.似乎您正在处理受约束的数据,并且您的观察结果非常接近左侧约束(零).尽管高斯人可能很好地近似了您的数据,但您应该谨慎行事,因为高斯人假定不受约束的几何形状.

I measured the fluorescence intensity of thousands of particles and made the histogram, which showed two adjacent gaussian curves. How to use python or its package to separate them into two Gaussian curves and make two new plots?

Thank you.

解决方案

Basically, you need to infer parameters for your Gaussian mixture. I will generate a similar dataset for the illustration.

Generating mixtures with known parameters

from itertools import starmap

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import mlab
sns.set(color_codes=True)
# inline plots in jupyter notebook
%matplotlib inline


# generate synthetic data from a mixture of two Gaussians with equal weights
# the solution below readily generalises to more components 
nsamples = 10000
means = [30, 120]
sds = [10, 50]
weights = [0.5, 0.5]
draws = np.random.multinomial(nsamples, weights)
samples = np.concatenate(
    list(starmap(np.random.normal, zip(means, sds, draws)))
)

Plot the distribution

sns.distplot(samples)

Infer parameters

from sklearn.mixture import GaussianMixture

mixture = GaussianMixture(n_components=2).fit(samples.reshape(-1, 1))
means_hat = mixture.means_.flatten()
weights_hat = mixture.weights_.flatten()
sds_hat = np.sqrt(mixture.covariances_).flatten()

print(mixture.converged_)
print(means_hat)
print(sds_hat)
print(weights_hat)

We get:

True
[ 122.57524745   29.97741112]
[ 48.18013893  10.44561398]
[ 0.48559771  0.51440229]

You can tweak GaussianMixture's hyper-parameters to improve fit, but this looks fine enough. Now we can plot each component (I'm only plotting the first one):

mu1_h, sd1_h = means_hat[0], sds_hat[0]
x_axis = np.linspace(mu1_h-3*sd1_h, mu1_h+3*sd1_h, 1000)
plt.plot(x_axis, mlab.normpdf(x_axis, mu1_h, sd1_h))

P.S.

On a sidenote. It seems like you are dealing with constrained data, and your observations are pretty close to the left constraint (zero). While Gaussians might approximate your data well enough, you should tread carefully, because Gaussians assume unconstrained geometry.

这篇关于如何使用python分隔两条高斯曲线?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆