关于如何使用 ggplot2 绘制 mixEM 类型数据的任何建议 [英] Any suggestions for how I can plot mixEM type data using ggplot2
问题描述
我有一个从我的原始数据中获得的 100 万条记录的样本.(供您参考,您可以使用这个可能产生近似相似分布的虚拟数据
I have a sample of 1m records obtained from my original data. (For your reference, you may use this dummy data that may generate approximately similar distribution
b <- data.frame(matrix(rnorm(2000000, mean=c(8,17), sd=2)))
c <- b[sample(nrow(b), 1000000), ]
)我认为直方图是两个对数正态分布的混合,我尝试使用以下代码使用 EM 算法拟合总和分布:
) I believed the histogram to be a mixture of two log-normal distributions and I tried to fit the summed distributions using EM algorithm using the following code:
install.packages("mixtools")
lib(mixtools)
#line below returns EM output of type mixEM[] for mixture of normal distributions
c1 <- normalmixEM(c, lambda=NULL, mu=NULL, sigma=NULL)
plot(c1, density=TRUE)
第一个图是对数似然图,第二个图(如果您再次点击返回),给出类似于以下密度曲线:
The first plot is a log-likelihood plot and the second (if you hit return again), gives similar to the following density curves:
正如我提到的,c1 是 mixEM[] 类型,而 plot() 函数可以适应这种情况.我想用颜色填充密度曲线.使用 ggplot2() 很容易做到这一点,但 ggplot2() 不支持 mixEM[] 类型的数据并抛出此消息:
As I mentioned c1 is of type mixEM[] and plot() function can accommodate that. I want to fill the density curves with colors. This is easy to do using ggplot2() but ggplot2() does not support data of type mixEM[] and throws this message:
ggplot 不知道如何处理 mixEM 类的数据
ggplot doesn't know how to deal with data of class mixEM
对于这个问题,我可以采取任何其他方法吗?
Is there any other approach I can take for this problem?
推荐答案
查看返回对象的结构(这个应该在帮助里有说明):
Look at the structure of the returned object (this should be documented in the help):
> # simple mixture of normals:
> x=c(rnorm(10000,8,2),rnorm(10000,17,4))
> xMix = normalmixEM(x, lambda=NULL, mu=NULL, sigma=NULL)
现在是什么:
> str(xMix)
List of 9
$ x : num [1:20000] 6.18 9.92 9.07 8.84 9.93 ...
$ lambda : num [1:2] 0.502 0.498
$ mu : num [1:2] 7.99 17.05
$ sigma : num [1:2] 2.03 4.02
$ loglik : num -59877
lambda、mu 和 sigma 组件定义返回的正常密度.您可以使用 qplot
和 stat_function
在 ggplot 中绘制这些图.但首先创建一个返回缩放正常密度的函数:
The lambda, mu, and sigma components define the returned normal densities. You can plot these in ggplot using qplot
and stat_function
. But first make a function that returns scaled normal densities:
sdnorm =
function(x, mean=0, sd=1, lambda=1){lambda*dnorm(x, mean=mean, sd=sd)}
那么:
qplot(x,geom="density") + stat_function(fun=sdnorm,args=list(mean=xMix$mu[1],sd=xMix$sigma[1], lambda=xMix$lambda[1]),fill="blue",geom="polygon") + stat_function(fun=sdnorm,args=list(mean=xMix$mu[2],sd=xMix$sigma[2], lambda=xMix$lambda[2]),fill="#FF0000",geom="polygon")
或者您拥有的任何 ggplot
技能.密度上的透明颜色可能会很好.
Or whatever ggplot
skills you have. Transparent colours on the densities might be nice.
ggplot(data.frame(x=x)) +
geom_histogram(aes(x=x,y=..density..),fill="white",color="black") +
stat_function(fun=sdnorm,
args=list(mean=xMix$mu[2],
sd=xMix$sigma[2],
lambda=xMix$lambda[2]),
fill="#FF000080",geom="polygon") +
stat_function(fun=sdnorm,
args=list(mean=xMix$mu[1],
sd=xMix$sigma[1],
lambda=xMix$lambda[1]),
fill="#00FF0080",geom="polygon")
制作:
这篇关于关于如何使用 ggplot2 绘制 mixEM 类型数据的任何建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!