OpenCV机器学习算法的CSV格式 [英] CSV format for OpenCV machine learning algorithms

查看:426
本文介绍了OpenCV机器学习算法的CSV格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

OpenCV中的机器学习算法似乎使用以CSV格式读取的数据。例如,请参阅此cpp文件。使用以下代码将数据读入OpenCV机器学习类 CvMLData

  CvMLData data; 
data.read_csv(filename)

但是,似乎没有任何现成的有关csv文件所需格式的文档。有谁知道如何csv文件应该安排?



其他(非Opencv)程序往往每个训练示例都有一行,并以表示类标签的整数或字符串开头。

解决方案

如果我阅读了该类的源代码,特别是str_to_flt_elem函数和类文档我的结论是,文件中各个项目的有效格式为:


  1. 任何可以解析为double的内容 strod

  2. 问号(?)或表示缺失值的空字符串

  3. 任何不解析的字符串


  4. 项目1和2仅对功能有效。任何由项目3匹配的东西被假定为类标签,并且就我可以推断项目的顺序无关紧要。 read_csv函数自动为csv文件中的每个列分配正确的类型,并且(如果需要)可以覆盖 set_response_index 。分隔符,你可以使用默认(,)或设置它任何你喜欢之前调用read_csv与 set_delimiter (只要不使用小数点)。



    因此,这应该工作,例如,6个数据点在3个类,每点有3个功能:

      A,1.2,3.2 e-2,+ 4.1 
    A,3.2,?,3.1
    B,4.2 ,, + 0.2
    B,4.3,2.0e3,.1
    C,2.3, 2.1e + 3, - 。1
    C,9.3,-9e2,10.4

    可以将您的文字标签移动到所需的任何列,或者甚至有多个文字标签。


    Machine learning algorithms in OpenCV appear to use data read in CSV format. See for example this cpp file. The data is read into an OpenCV machine learning class CvMLData using the following code:

    CvMLData data;
    data.read_csv( filename ) 
    

    However, there does not appear to be any readily available documentation on the required format for the csv file. Does anyone know how the csv file should be arranged?

    Other (non-Opencv) programs tend to have a line per training example, and begin with an integer or string indicating the class label.

    解决方案

    If I read the source for that class, particularly the str_to_flt_elem function, and the class documentation I conclude that valid formats for individual items in the file are:

    1. Anything that can be parsed to a double by strod
    2. A question mark (?) or the empty string to represent missing values
    3. Any string that doesn't parse to a double.

    Items 1 and 2 are only valid for features. anything matched by item 3 is assumed to be a class label, and as far as I can deduce the order of the items doesn't matter. The read_csv function automatically assigns each column in the csv file the correct type, and (if you want) you can override the labels with set_response_index. Delimiter wise you can use the default (,) or set it to whatever you like before calling read_csv with set_delimiter (as long as you don't use the decimal point).

    So this should work for example, for 6 datapoints in 3 classes with 3 features per point:

    A,1.2,3.2e-2,+4.1
    A,3.2,?,3.1
    B,4.2,,+0.2
    B,4.3,2.0e3,.1
    C,2.3,-2.1e+3,-.1
    C,9.3,-9e2,10.4
    

    You can move your text label to any column you want, or even have multiple text labels.

    这篇关于OpenCV机器学习算法的CSV格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆