ELKI:在Java中的自定义对象上运行DBSCAN [英] ELKI: Running DBSCAN on custom Objects in Java

查看:211
本文介绍了ELKI:在Java中的自定义对象上运行DBSCAN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在JAVA中使用ELKI来运行DBSCAN.为了进行测试,我使用了FileBasedDatabaseConnection.现在,我想使用自定义对象作为参数来运行DBSCAN.

I'm trying to use ELKI from within JAVA to run DBSCAN. For testing I used a FileBasedDatabaseConnection. Now I would like to run DBSCAN with my custom Objects as parameters.

我的对象具有以下结构:

My objects have the following structure:

public class MyObject {
  private Long id;
  private Float param1;
  private Float param2;
  // ... and more parameters as well as getters and setters
}

我想使用List<MyObject>作为数据库在ELKI中运行DBSCAN,但是仅应考虑某些参数(例如,使用参数param1,param2和param4在对象上运行DBSCAN).理想情况下,生成的群集包含整个对象.

I'd like to run DBSCAN within ELKI using a List<MyObject> as database, but only some of the parameters should be taken into account (e.g. running DBSCAN on the objects using the parameters param1, param2 and param4). Ideally the resulting clusters contain the whole objects.

有没有办法实现这种行为?

Is there any way to achieve this behaviour?

如果没有,如何将对象转换为ELKI可以理解的格式,并允许我将生成的群集对象与我的自定义对象进行匹配(即,有一种简便的方法可以通过编程方式设置标签)?

If not, how can I convert the objects into a format that ELKI understands and allows me to match the resulting cluster-objects with my custom objects (i.e. is there an easy way to programmatically set a label)?

以下问题涉及到功能向量:使用ELKI自定义对象并理解结果
这可能是我的问题的可能解决方案吗?以及如何从我的List<MyObject>中创建特征向量?

The following question speaks of featureVectors: Using ELKI on custom objects and making sense of results
May this be a possible solution for my problem? And how is a feature vector created out of my List<MyObject>?

推荐答案

ELKI具有模块化体系结构.

ELKI has a modular architecture.

如果需要自己的数据源,请查看datasource程序包,并实现

If you want your own data source, look at the datasource package, and implement the DatabaseConnection (JavaDoc) interface.

如果要处理MyObject个对象(上面共享的类可能会对性能产生重大影响),这并不是特别困难.您需要 SimpleTypeInformation<MyObject> (JavaDoc)来识别您的数据类型,并实现

If you want to process MyObject objects (the class you shared above will likely come at a substantial performance impact), that is not particularly hard. You need a SimpleTypeInformation<MyObject> (JavaDoc) to identify your data type, and implement a PrimitiveDistanceFunction (JavaDoc) for your data type.

如果您的实际数据是浮动的,我建议使用 SubspaceEuclideanDistanceFunction 仅处理您要使用的那些属性.

If your actual data are floats, I suggest to use DoubleVector or FloatVector instead, and just use e.g. SubspaceEuclideanDistanceFunction to handle only those attributes you want to use.

对于这些数据类型和许多距离函数,可以使用R * -tree索引显着加快DBSCAN执行时间.

For these data types and many distance functions, R*-tree indexes can be used substantially speed up DBSCAN execution time.

A Cluster (JavaDoc)(永远不会)存储点数据.它仅存储点 DBIDs (Wiki).您可以从数据库关系中获取点数据,或使用例如偏移量(Wiki)将其映射回列表静态数据库的位置.

A Cluster (JavaDoc) in ELKI never stores the point data. It only stores point DBIDs (Wiki). You can get the point data from the Database relation, or use e.g. offsets (Wiki) to map them back to a list position for static databases.

这篇关于ELKI:在Java中的自定义对象上运行DBSCAN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆