如何使用ELKI中的现有数据 [英] How to use existing data in ELKI

查看:149
本文介绍了如何使用ELKI中的现有数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这两天,我一直在寻找ELKI,同时寻找最合适的密度聚类工具,并决定尝试使用它.对于DBSCAN,我成功地成功重现了对文件"3clusters-and-noise-2d.csv"进行聚类的测试,并且还设法通过来自github的ELKI代码打印了聚类元数据和每个聚类中的点(最新版本)在Java中(我对cli或ui工具并不真正感兴趣).

I keep stubbling upon ELKI these couple of days while searching for the most suitable density clustering tool and decided to try it. For DBSCAN, I've managed to reproduce successfully the test which clusters the file "3clusters-and-noise-2d.csv" and have also managed to print clusters metadata and points in each cluster all via ELKI code from github (latest version) IN java (I'm not really interested in cli or ui tool).

现在,我想使用某种内部Java结构来创建数据库,而不是通过文件导入来减少读写开销.

Now, I want to use some kind of internal java structure to create a database instead of importing via a file to reduce write and read overhead.

示例中,只要我能够但这仅适用于文件的第一列.

In the example provided I'm able to do this but for only the first column of the file.

我的基本问题是,当Java中已经存在相同的数据时,如何创建通过文件创建的相同数据库?

My question basically is, how to create the same database which was created via a file, when the same data already exists in java?

知道了!

因此,在进行一些调整之后,基本上要做的是使用2d的双精度数组,其中每一行代表一个点,并且具有与维度一样多的列...在不读取文件的情况下创建数据库,基本上使用ArrayAdapterDatabaseConnection如下:

so after some tweaking, basically what you do is use 2d array of doubles where each row represents a point and you have as much columns as your dimensions... to create your database without reading a file, you basically use an ArrayAdapterDatabaseConnection as follows:

    double[][] data = new double[NUM_OF_POINTS][NUM_OF_DIMENSIONS]; 
    //populate data according to your app
    DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(data);
    Database db = new StaticArrayDatabase(dbc, null);
    db.initialize();

    //dbscan algorithm setup
    params = new ListParameterization();
    params.addParameter(DBSCAN.Parameterizer.EPSILON_ID, 0.04);
    params.addParameter(DBSCAN.Parameterizer.MINPTS_ID, 20);
    DBSCAN<DoubleVector> dbscan = ClassGenericsUtil.parameterizeOrAbort(DBSCAN.class, params);

    //run DBSCAN on database
    Clustering<Model> result = dbscan.run(db);

我已经使用"3clusters-and-noise-2d.csv"数据集对此进行了测试,并且可以确定当我通过文件或arrayadapter传递它们时也能得到相同的结果.

I've tested this with the "3clusters-and-noise-2d.csv" dataset and can confirm i get same results when I pass them via file or arrayadapter.

推荐答案

可以在ELKI来源中找到完整的示例:

A complete example can be found in the ELKI sources:

http://elki.dbs.ifi.lmu.de/browser/elki/elki/src/main/java/tutorial/javaapi/PassingDataToELKI.java

它生成随机数据并在其上运行k均值.它还显示了如何可靠地将DBIDs映射回您的数据点.

It generates random data and runs k-means on it. It also shows how to reliably map back DBIDs to your data points.

这篇关于如何使用ELKI中的现有数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆