Scikit-Learn 的 DPGMM 拟合:组件数量? [英] Scikit-Learn's DPGMM fitting: number of components?

查看:57
本文介绍了Scikit-Learn 的 DPGMM 拟合:组件数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 scikit-learn 的 DPGMM 算法将混合法线模型拟合到一些数据中.[0] 上宣传的优点之一是我不需要指定组件的数量;这很好,因为我不知道数据中的组件数量.文档指出我只需要指定一个上限.然而,这看起来很不真实:

<预><代码>>>>数据 = numpy.random.normal(loc = 0.0,比例 = 1.0,大小 = 1000)>>>从 sklearn.mixture 导入 DPGMM>>>d = DPGMM(n_components=5)>>>d.fit(data.reshape(-1,1))DPGMM(alpha=1.0, covariance_type='diag', init_params='wmc', min_covar=None,n_components=5, n_iter=10, params='wmc', random_state=None, thresh=None,tol=0.001, 详细=0)>>>d.n_components5>>>d.means_数组([[-0.02283383],[0.06259168],[0.00390097],[0.02934676],[-0.05533165]])

如您所见,即使对于仅从一个正态分布中清晰采样的数据,拟合也会报告五个分量(上限).

我做错了吗?我是不是误会了什么?

非常感谢,

卢卡斯

[0] http://scikit-learn.org/stable/modules/mixture.html#dpgmm

解决方案

我最近对这个 DPGMM 实现的结果有类似的疑问.如果您检查提供的 example,您会注意到 DPGMM 总是返回带有 n_components 的模型,现在的诀窍是删除多余的组件.这可以通过预测功能来完成.

不幸的是,这个重要的图片隐藏在代码示例的注释中.

<块引用>

# 因为 DP 不会使用它有权访问的每个组件
# 除非需要,否则我们不应该绘制冗余组件

I'm trying to fit a mixed normal model to some data using scikit-learn's DPGMM algorithm. One of the advantages advertised on [0] is that I don't need to specify the number of components; which is good, because I do not know the number of components in my data. The documentation states that I only need to specify an upper bound. However, it looks very much like that is not true:

>>> data = numpy.random.normal(loc = 0.0, scale = 1.0, size = 1000) 
>>> from sklearn.mixture import DPGMM
>>> d = DPGMM(n_components=5)
>>> d.fit(data.reshape(-1,1))
DPGMM(alpha=1.0, covariance_type='diag', init_params='wmc', min_covar=None,
   n_components=5, n_iter=10, params='wmc', random_state=None, thresh=None,
   tol=0.001, verbose=0)
>>> d.n_components
5
>>> d.means_
array([[-0.02283383],
       [ 0.06259168],
       [ 0.00390097],
       [ 0.02934676],
       [-0.05533165]])

As you can see, the fitting reports five components (the upper bound) even for data clearly sampled from just one normal distribution.

Am I doing something wrong? Did I misunderstand something?

Thanks a lot in advance,

Lukas

[0] http://scikit-learn.org/stable/modules/mixture.html#dpgmm

解决方案

I recently had similar doubts about results of this DPGMM implementation. If you check provided example you notice that DPGMM always return model with n_components, now the trick is to remove redundant components. This can be done with predict function.

Unfortunately this important pice is hidden in comment in code example.

# as the DP will not use every component it has access to
# unless it needs it, we shouldn't plot the redundant components

这篇关于Scikit-Learn 的 DPGMM 拟合:组件数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆