使用sklearn DBSCAN模型对新条目进行分类 [英] Use sklearn DBSCAN model to classify new entries

查看:476
本文介绍了使用sklearn DBSCAN模型对新条目进行分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个庞大的动态"数据集,我正在尝试在其中找到有趣的聚类.

I have a huge "dynamic" dataset and I'm trying to find interesting clusters on it.

在运行了许多不同的无监督群集算法之后,我发现了

After running a lot of different unsupervised clustering algorithms I have found a configuration of DBSCAN which gives coherent results.

我想根据我的测试数据推断DBSCAN创建的模型,以将其应用于其他数据集,但无需重新运行该算法.我无法在整个数据集上运行该算法,因为它会耗尽内存,并且由于数据是动态的,因此该模型在其他时间对我来说可能没有意义.

I would like to extrapolate the model that DBSCAN creates according to my test data to apply it to other datasets, but without re-running the algorithm. I cannot run the algorithm over the whole dataset cause it would run out of memory, and the model might not make sense to me at a different time as the data is dynamic.

使用 sklearn ,我发现了其他聚类算法-例如

Using sklearn, I have found that other clustering algorithms - like MiniBatchKMeans - have a predict method, but DBSCAN does not.

我知道对于MiniBatchKMeans,质心唯一定义模型.但是DBSCAN可能不存在这种情况.

I understand that for MiniBatchKMeans the centroids uniquely define the model. But such a thing might not exist for DBSCAN.

所以我的问题是:推断DBSCAN模型的正确方法是什么?我应该使用DBSCAN在我的测试数据集上给出的输出来训练一种监督学习算法吗?还是在本质上属于DBSCAN模型的东西可以用于分类新数据而无需重新运行算法?

So my question is: What is the proper way to extrapolate the DBSCAN model? should I train a supervised learning algorithm using the output that DBSCAN gave on my test dataset? or is there something intrinsically belonging to DBSCAN model that can be used to classify new data without re-running the algorithm?

推荐答案

根据您的模型训练分类器.

Train a classificator based on your model.

DBSCAN不容易适应新对象,因为您最终需要调整minPts.将点添加到DBSCAN可能导致集群合并,您可能不想发生这种情况.

DBSCAN is not easy to adapt to new objects, because you would need to eventually adjust minPts. Adding points to DBSCAN can cause clusters to merge, which you probably do not want to happen.

如果您认为DBSCAN发现的集群很有用,请训练分类器以将新实例放入相同的类中.现在,您要执行分类,而不是重新发现结构.

If you consider the clusters found by DBSCAN to be useful, train a classifier to put new instances into the same classes. You now want to perform classification, not rediscover structure.

这篇关于使用sklearn DBSCAN模型对新条目进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆