MeanShift `fit` 与 `fit_predict` scikitlearn [英] MeanShift `fit` vs `fit_predict` scikitlearn

查看:116
本文介绍了MeanShift `fit` 与 `fit_predict` scikitlearn的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设 X 是一个典型形式的数组.给定代码.

Suppose X is an array of the typical form. Given the code.

from sklearn.cluster import MeanShift
ms = MeanShift(bin_seeding=True,cluster_all=False)
ms.fit(X)

一旦我这样做了,ms 有两个属性:labels_cluster_centers_.所以我的第一个问题是....ms.fit_predict(X)ms.predict(X) 有什么意义,因为我们已经有了一个分类我们可以从 labels_ 中读取 X 的哪个?

Once I do this, ms has two attributes: labels_ and cluster_centers_. So my first question is.... what is the point of ms.fit_predict(X) or ms.predict(X) since we already have a classification of X which we can read from labels_?

推荐答案

主要区别在于,当您说 ms.fit(X) 时,X 是您的标记数据集/训练数据集.在说 ms.fit_predict(X') 时, X' 是您的未标记/测试数据集.即,您正在使用 fit_predict 对未标记的数据集进行预测.即,fit(X) 执行聚类,而 fit_predict 为您提供聚类标签.在 sklearn.cluster.mean_shift_.MeanShift 对象上没有像 ms.predict(X) 这样的东西.另请参阅下面的 dir(ms).

The main difference is that when you say, ms.fit(X) , X is your labeled dataset/train dataset. on saying ms.fit_predict(X') , X' is your unlabeled/test dataset. ie, you are predicting on an unlabeled dataset with fit_predict. i.e, fit(X) performs clustering, while, fit_predict, gives you cluster labels. And there's nothing like, ms.predict(X), on sklearn.cluster.mean_shift_.MeanShift object. See also, dir(ms) for this, below.

>>> help(ms.fit)
Help on method fit in module sklearn.cluster.mean_shift_:

fit(self, X) method of sklearn.cluster.mean_shift_.MeanShift instance
    Perform clustering.

    Parameters
    -----------
    X : array-like, shape=[n_samples, n_features]
        Samples to cluster.

>>> help(ms.fit_predict)
Help on method fit_predict in module sklearn.base:

fit_predict(self, X, y=None) method of sklearn.cluster.mean_shift_.MeanShift instance
    Performs clustering on X and returns cluster labels.

    Parameters
    ----------
    X : ndarray, shape (n_samples, n_features)
        Input data.

    Returns
    -------
    y : ndarray, shape (n_samples,)
        cluster labels


dir(ms)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_get_param_names', 'bandwidth', 'bin_seeding', 'cluster_all', 'fit', 'fit_predict', 'get_params', 'min_bin_freq', 'seeds', 'set_params']

ms 的属性为 _labels &_cluster_centers,使用 X 数据,然后您可以使用标准的错误分类惩罚技术来估计模型的好坏.您不能再使用 fit_predict 进行估计,因为您只会得到标签,而不会得到聚类中心.因此,实际上由您来设计聚类中心,这取决于您的优劣标准.

With ms having attributes as _labels & _cluster_centers, with X data, you can then estimate the goodness of the model using standard mis-classification penalty techniques. you can't estimate anymore with fit_predict, since you will get only labels, and not cluster centers. So, it is upto you to design the cluster centers, really, depending on your goodness criteria.

这篇关于MeanShift `fit` 与 `fit_predict` scikitlearn的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆