在R?中等效于Matlab对高斯混合模型的“拟合". [英] Equivalent of Matlab's 'fit' for Gaussian mixture models in R?

查看:332
本文介绍了在R?中等效于Matlab对高斯混合模型的“拟合".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些类似以下的时间序列数据:

I have some time series data that looks like this:

x <- c(0.5833, 0.95041, 1.722, 3.1928, 3.941, 5.1202, 6.2125, 5.8828, 
4.3406, 5.1353, 3.8468, 4.233, 5.8468, 6.1872, 6.1245, 7.6262, 
8.6887, 7.7549, 6.9805, 4.3217, 3.0347, 2.4026, 1.9317, 1.7305, 
1.665, 1.5655, 1.3758, 1.5472, 1.7839, 1.951, 1.864, 1.6638, 
1.5624, 1.4922, 0.9406, 0.84512, 0.48423, 0.3919, 0.30773, 0.29264, 
0.19015, 0.13312, 0.25226, 0.29403, 0.23901, 0.000213074755156413, 
5.96565965097398e-05, 0.086874, 0.000926808687858284, 0.000904641782399267, 
0.000513042259030044, 0.40736, 4.53928073402494e-05, 0.000765719624469057, 
0.000717419263673946)

我想使用一到五个高斯的混合来拟合此数据的曲线.在Matlab中,我可以执行以下操作:

I would like to fit a curve to this data, using mixtures of one to five Gaussians. In Matlab, I could do the following:

fits{1} = fit(1:length(x),x,fittype('gauss1'));
fits{2} = fit(1:length(x),x,fittype('gauss2'));
fits{3} = fit(1:length(x),x,fittype('gauss3'));

...等等.

在R中,我很难确定类似的方法.

In R, I am having difficulty identifying a similar method.

dat <- data.frame(time = 1:length(x), x = x)
fits[[1]] <- Mclust(dat, G = 1)
fits[[2]] <- Mclust(dat, G = 2)
fits[[3]] <- Mclust(dat, G = 3)

...但是这似乎并没有做完全相同的事情.例如,我不确定如何使用Mclust解计算拟合曲线和原始数据之间的R ^ 2.

... but this does not really seem to be doing quite the same thing. For example, I am not sure how to calculate the R^2 between the fit curve and the original data using the Mclust solution.

在基数R中,是否有更简单的替代方法可以使用高斯混合来拟合曲线?

Is there a simpler alternative in base R to fitting a curve using a mixture of Gaussians?

推荐答案

功能

使用下面给出的代码,并且在寻找良好的初始参数时有些运气,您应该能够对数据进行高斯曲线拟合.

Function

With the code given below, and with a bit of luck in finding good initial parameters, you should be able to curve-fit Gaussian's to your data.

在功能fit_gauss中,目标是y ~ fit_gauss(x),要使用的高斯数由初始值的长度确定,这些参数包括:abd所有长度都应相等

In the function fit_gauss, aim is to y ~ fit_gauss(x) and the number of Gaussians to use is determined by the length of the initial values for parameters: a, b, d all of which should be equal length

我已经展示了OP数据最多三个高斯曲线的曲线拟合.

I have demonstrated curve-fitting of OP's data up to three Gaussian's.

这几乎是我使用nls完成的大部分工作(这要归功于OP).因此,我不太确定选择初始值的最佳方法是什么.自然,它们取决于峰的高度(a),峰周围的x的均值和标准偏差(bd).

This it pretty much most work I have done with nls (thanks to OP for that). So, I am not quite sure what is the best method select the initial values. Naturally, they depend on height's of peaks (a), mean and standard deviation of x around them (b and d).

对于给定数量的高斯,一种选择是尝试一些初始值,然后根据剩余标准误差fit$sigma找到最适合的一种.

One option would be for given number of Gaussian's, try with a number of starting values, and find the one that has best fit based on residual standard error fit$sigma.

我花了些力气才能找到初始参数,但我敢说这些参数和 具有三个高斯模型的图看起来很可靠.

I fiddled a bit to find initial parameters, but I dare say the parameters and the plot with three Gaussian model looks solid.

ind <- 1 : length(x)

# plot original data
plot(ind, x, pch = 21, bg = "blue")

# Gaussian fit 
fit_gauss <- function(y, x, a, b, d) {

  p_model <- function(x, a, b, d) {
      rowSums(sapply(1:length(a), 
                 function(i) a[i] * exp(-((x - b[i])/d[i])^2)))
  }

  fit <- nls(y ~ p_model(x, a, b, d), 
             start = list(a=a, b = b, d = d), 
             trace = FALSE,  
             control = list(warnOnly = TRUE, minFactor = 1/2048))
  fit
}

单高斯

g1 <- fit_gauss(y = x, x = ind, a=1, b = mean(ind), d = sd(ind))
lines(ind, predict(g1), lwd = 2, col = "green")

两个高斯的

g2 <- fit_gauss(y = x, x = ind, a = c(coef(g1)[1], 1), 
                                b = c(coef(g1)[2], 30), 
                                d = c(coef(g1)[1], 2))
lines(ind, predict(g2), lwd = 2, col = "red")

三个高斯的

g3 <- fit_gauss(y = x, x = ind, a=c(5, 4, 4), 
                b = c(12, 17, 11), d = c(13, 2, 2))

lines(ind, predict(g3), lwd = 2, col = "black")

三个高斯拟合的总和

summary(g3)

# Formula: x ~ p_model(ind, a, b, d)
# 
# Parameters:
#   Estimate Std. Error t value Pr(>|t|)    
#   a1   5.9307     0.5588  10.613 5.93e-14 ***
#   a2   3.5689     0.7098   5.028 8.00e-06 ***
#   a3  -2.2066     0.8901  -2.479 0.016894 *  
#   b1  12.9545     0.5289  24.495  < 2e-16 ***
#   b2  17.4709     0.2708  64.516  < 2e-16 ***
#   b3  11.3839     0.3116  36.538  < 2e-16 ***
#   d1  11.4351     0.8568  13.347  < 2e-16 ***
#   d2   1.8893     0.4897   3.858 0.000355 ***
#   d3   1.0848     0.6309   1.719 0.092285 .  
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.7476 on 46 degrees of freedom
# 
# Number of iterations to convergence: 34 
# Achieved convergence tolerance: 8.116e-06

这篇关于在R?中等效于Matlab对高斯混合模型的“拟合".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆