通过不使用原始研究论文中已经建议的算法进行DBSCAN的eps估计 [英] eps estimation for DBSCAN by not using the already suggested algorithm in the Original research paper

查看:100
本文介绍了通过不使用原始研究论文中已经建议的算法进行DBSCAN的eps估计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须使用python实现DBSCAN,并且epsilon估计一直存在问题,因为原始研究论文中已建议的方法假定数据集的分布像斑点一样,而在我的情况下,它更像是一种可固化的可治愈数据有一定间隔的跳跃。跳跃导致DBSCAN在跳跃之间的间隔中形成各种数据集的不同聚类(这对我来说足够好),但是针对不同数据集的动态epsilon计算不会产生期望的结果,因为这些点倾向于位于一条直线上。许多时间间隔,并且更改 k值会导致eps值发生较大变化。

I have to implement DBSCAN using python, and the epsilon estimation has been posing problems as the already suggested method in the original research paper assumes blob like distribution of the dataset, where as in my case it is more of a cure fittable data with jumps at some intervals. The jumps cause the DBSCAN to form different clusters of various datasets in the intervals between jumps(which is good enough for me), but the epsilon calculation dynamically for different datasets does not produce desired results as the points tend to lie on a straight line for many intervals, and changing 'k' value cause a considerable change in the eps value.

推荐答案

尝试使用OPTICS算法,您会赢了无需估算其中的eps。
另外,我建议进行递归回归,其中使用python的最佳曲线拟合 scipy.optimize.curve_fit 来获得最佳曲线,然后找到rms误差所有的点都在曲线上。然后删除n%的点,然后递归重复此操作,直到您的均方根误差小于阈值为止。

Try using OPTICS algorithm, you won't need to estimate eps in that. Also, I would suggest recursive regression, where you use the python's best curve fit scipy.optimize.curve_fit to get best curve, and then find the rms error of all the points wrt the curve. Then remove 'n' percent of points, and recursively repeat this untill your rms error is less than your threshold.

这篇关于通过不使用原始研究论文中已经建议的算法进行DBSCAN的eps估计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆