DBSCAN算法和聚类算法进行数据挖掘 [英] DBSCAN algorithm and clustering algorithm for data mining

查看:307
本文介绍了DBSCAN算法和聚类算法进行数据挖掘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何实现DBSCAN算法的分类数据(蘑菇数据集)?

How do you implement DBSCAN algorithm on categorical data (mushroom data set)?

什么是一通聚类算法?

你能提供的伪code为一通聚类算法?

Could you provide pseudo code for a one pass clustering algorithm?

推荐答案

您可以使用不带任何更改任意距离函数运行DBSCAN。索引部分将更加困难,所以你可能只得到为O(n ^ 2)的复杂性。

You can run DBSCAN with an arbitrary distance function without any changes to it. The indexing part will be more difficult, so you will likely only get O(n^2) complexity.

但如果你仔细观察DBSCAN,它是所有计算距离,把它们比作一个阈值,并计算对象。这是它的一个主要优势,它可以很容易地应用到各种数据,你需要的是定义一个距离函数和阈值。

But if you look closely at DBSCAN, all it does is compute distances, compare them to a threshold, and count objects. This is a key strength of it, it can easily be applied to various kinds of data, all you need is to define a distance function and thresholds.

我怀疑有DBSCAN的一次通过的版本,因为它依赖于成对距离。您可以修剪一些计算的(这是该指数进场),但本质上则需要每个对象比较所有其它的对象,因此它在为O(n log n)的键,不是一通。

I doubt there is a one-pass version of DBSCAN, as it relies on pairwise distances. You can prune some of these computations (this is where the index comes into play), but essentially you need to compare every object to every other object, so it is in O(n log n) and not one-pass.

一通:我认为原来的k均值是一通算法。第k个对象是你最初的手段。对于每一个新的对象,你选择了关闭的意思,并与新的对象进行更新(增量)。只要你不这样做的另一个迭代你的数据集,这是一通。 (结果会更差比劳埃德式K-手段虽然)。

One-pass: I believe the original k-means was a one-pass algorithm. The first k objects are your initial means. For every new object, you choose the closes mean and update it (incrementally) with the new object. As long as you don't do another iteration over your data set, this was "one-pass". (The result will be even worse than lloyd-style k-means though).

这篇关于DBSCAN算法和聚类算法进行数据挖掘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆