MeanShift `fit` 与 `fit_predict` scikitlearn [英] MeanShift `fit` vs `fit_predict` scikitlearn
问题描述
假设 X
是一个典型形式的数组.给定代码.
Suppose X
is an array of the typical form. Given the code.
from sklearn.cluster import MeanShift
ms = MeanShift(bin_seeding=True,cluster_all=False)
ms.fit(X)
一旦我这样做了,ms
有两个属性:labels_
和 cluster_centers_
.所以我的第一个问题是....ms.fit_predict(X)
或 ms.predict(X)
有什么意义,因为我们已经有了一个分类我们可以从 labels_
中读取 X 的哪个?
Once I do this, ms
has two attributes: labels_
and cluster_centers_
. So my first question is.... what is the point of ms.fit_predict(X)
or ms.predict(X)
since we already have a classification of X which we can read from labels_
?
推荐答案
主要区别在于,当您说 ms.fit(X)
时,X
是您的标记数据集/训练数据集.在说 ms.fit_predict(X')
时, X'
是您的未标记/测试数据集.即,您正在使用 fit_predict
对未标记的数据集进行预测.即,fit(X)
执行聚类,而 fit_predict
为您提供聚类标签.在 sklearn.cluster.mean_shift_.MeanShift
对象上没有像 ms.predict(X)
这样的东西.另请参阅下面的 dir(ms)
.
The main difference is that when you say, ms.fit(X)
, X
is your labeled dataset/train dataset. on saying ms.fit_predict(X')
, X'
is your unlabeled/test dataset. ie, you are predicting on an unlabeled dataset with fit_predict
.
i.e, fit(X)
performs clustering, while, fit_predict
, gives you cluster labels. And there's nothing like, ms.predict(X)
, on sklearn.cluster.mean_shift_.MeanShift
object.
See also, dir(ms)
for this, below.
>>> help(ms.fit)
Help on method fit in module sklearn.cluster.mean_shift_:
fit(self, X) method of sklearn.cluster.mean_shift_.MeanShift instance
Perform clustering.
Parameters
-----------
X : array-like, shape=[n_samples, n_features]
Samples to cluster.
>>> help(ms.fit_predict)
Help on method fit_predict in module sklearn.base:
fit_predict(self, X, y=None) method of sklearn.cluster.mean_shift_.MeanShift instance
Performs clustering on X and returns cluster labels.
Parameters
----------
X : ndarray, shape (n_samples, n_features)
Input data.
Returns
-------
y : ndarray, shape (n_samples,)
cluster labels
dir(ms)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_get_param_names', 'bandwidth', 'bin_seeding', 'cluster_all', 'fit', 'fit_predict', 'get_params', 'min_bin_freq', 'seeds', 'set_params']
ms
的属性为 _labels
&_cluster_centers
,使用 X
数据,然后您可以使用标准的错误分类惩罚技术来估计模型的好坏.您不能再使用 fit_predict
进行估计,因为您只会得到标签,而不会得到聚类中心.因此,实际上由您来设计聚类中心,这取决于您的优劣标准.
With ms
having attributes as _labels
& _cluster_centers
, with X
data, you can then estimate the goodness of the model using standard mis-classification penalty techniques. you can't estimate anymore with fit_predict
, since you will get only labels, and not cluster centers. So, it is upto you to design the cluster centers, really, depending on your goodness criteria.
这篇关于MeanShift `fit` 与 `fit_predict` scikitlearn的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!