使用sklearn DBSCAN模型对新条目进行分类 [英] Use sklearn DBSCAN model to classify new entries
问题描述
我有一个庞大的动态"数据集,我正在尝试在其中找到有趣的聚类.
I have a huge "dynamic" dataset and I'm trying to find interesting clusters on it.
After running a lot of different unsupervised clustering algorithms I have found a configuration of DBSCAN which gives coherent results.
我想根据我的测试数据推断DBSCAN
创建的模型,以将其应用于其他数据集,但无需重新运行该算法.我无法在整个数据集上运行该算法,因为它会耗尽内存,并且由于数据是动态的,因此该模型在其他时间对我来说可能没有意义.
I would like to extrapolate the model that DBSCAN
creates according to my test data to apply it to other datasets, but without re-running the algorithm. I cannot run the algorithm over the whole dataset cause it would run out of memory, and the model might not make sense to me at a different time as the data is dynamic.
Using sklearn, I have found that other clustering algorithms - like MiniBatchKMeans - have a predict
method, but DBSCAN
does not.
我知道对于MiniBatchKMeans
,质心唯一定义模型.但是DBSCAN
可能不存在这种情况.
I understand that for MiniBatchKMeans
the centroids uniquely define the model. But such a thing might not exist for DBSCAN
.
所以我的问题是:推断DBSCAN
模型的正确方法是什么?我应该使用DBSCAN
在我的测试数据集上给出的输出来训练一种监督学习算法吗?还是在本质上属于DBSCAN
模型的东西可以用于分类新数据而无需重新运行算法?
So my question is: What is the proper way to extrapolate the DBSCAN
model? should I train a supervised learning algorithm using the output that DBSCAN
gave on my test dataset? or is there something intrinsically belonging to DBSCAN
model that can be used to classify new data without re-running the algorithm?
推荐答案
根据您的模型训练分类器.
Train a classificator based on your model.
DBSCAN不容易适应新对象,因为您最终需要调整minPts.将点添加到DBSCAN可能导致集群合并,您可能不想发生这种情况.
DBSCAN is not easy to adapt to new objects, because you would need to eventually adjust minPts. Adding points to DBSCAN can cause clusters to merge, which you probably do not want to happen.
如果您认为DBSCAN发现的集群很有用,请训练分类器以将新实例放入相同的类中.现在,您要执行分类,而不是重新发现结构.
If you consider the clusters found by DBSCAN to be useful, train a classifier to put new instances into the same classes. You now want to perform classification, not rediscover structure.
这篇关于使用sklearn DBSCAN模型对新条目进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!