Weka中ARFF格式的属性过多 [英] Too many attributes for ARFF format in Weka

查看:112
本文介绍了Weka中ARFF格式的属性过多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用维度大于10,000的数据集。要使用Weka,我需要将文本文件转换为ARFF格式,但是由于即使使用稀疏ARFF格式后,属性仍然太多,所以文件大小太大。是否有与数据类似的方法来避免写入太多的属性标识符,如ARFF文件的标题中那样。

I am working with a data-set of dimension more than 10,000. To use Weka I need to convert text file into ARFF format, but since there are too many attributes even after using sparse ARFF format file size is too large. Is there any similar method as for data to avoid writing so many attribute identifier as in header of ARFF file.

例如:

@attribute A1 NUMERICAL

@attribute A2 NUMERICAL

...

...

@attribute A10000数值

for example :
@attribute A1 NUMERICAL
@attribute A2 NUMERICAL
...
...
@attribute A10000 NUMERICAL

推荐答案

我用AWK编写了一个脚本,将以下行(在TXT文件中)格式化为ARFF

I coded a script in AWK to format the following lines (in a TXT file) to an ARFF

example.txt源代码:

example.txt source:

Att_0 | Att_1 | Att_2 | ... | Att_n
1 | 2 | 3 | ... | 999

我的脚本(to_arff),您可以根据TXT文件中使用的分隔符更改FS值:

My script (to_arff), you can change FS value depending on the separator used in the TXT file:

#!/usr/bin/awk -f
# ./<script>.awk data.txt > data.arff

BEGIN {
    FS = "|";
    # WEKA separator
    separator = ",";
}

# The first line
NR == 1 {
    # WEKA headers
        split(FILENAME, relation, ".");
        # the relation's name is the source file's name
    print "@RELATION "relation[1]"\n";
    # attributes are "numeric" by default
    # types available: numeric, <nominal> {n1, n2, ..., nN}, string and date [<date-format>]
    for (i = 1; i <= NF; i++) {
        print "@ATTRIBUTE "$i" NUMERIC";
    }
    print "\n@DATA";
}

NR > 1 {
    s = "";
    first = 1;
    for (i = 1; i <= NF; i++) {
        if (first)
            first = 0;
        else
            s = s separator;
        s = s $i;
    }
    print s;
}

输出:

@RELATION example

@ATTRIBUTE Att_0 NUMERIC
@ATTRIBUTE Att_1 NUMERIC
@ATTRIBUTE Att_2 NUMERIC
@ATTRIBUTE Att_n NUMERIC

@DATA
1,2,3,9999

这篇关于Weka中ARFF格式的属性过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆