在MATLAB中导入CSV文件的最快方法 [英] Fastest way to import CSV files in MATLAB

查看:401
本文介绍了在MATLAB中导入CSV文件的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个脚本,将其输出保存到CSV文件供以后引用,但导入数据的第二个脚本花费了大量的时间来读取它。

I've written a script that saves its output to a CSV file for later reference, but the second script for importing the data takes an ungainly amount of time to read it back in.

数据格式如下:

Item1,val1,val2,val3
Item2,val4,val5,val6,val7
Item3,val8,val9

在最左侧的列上,并且数据值占据该行的剩余部分。一个主要的困难是数据值的数组对于每个测试项可以是不同的长度。我将它保存为一个结构,但我需要能够在MATLAB环境之外编辑它,因为有时我必须删除不良数据的计算机上没有安装MATLAB行。所以真的,我的问题的一部分是:我应该保存数据以不同的格式吗?

where the headers are on the left-most column, and the data values take up the remainder of the row. One major difficulty is that the arrays of data values can be different lengths for each test item. I'd save it as a structure, but I need to be able to edit it outside the MATLAB environment, since sometimes I have to delete rows of bad data on a computer that doesn't have MATLAB installed. So really, part one of my question is: Should I save the data in a different format?

问题的第二部分:
我试过< a href =http://www.mathworks.com/access/helpdesk/help/techdoc/ref/importdata.html =noreferrer> importdata , csvread dlmread ,但我不知道哪个是最好的,或者如果有一个更好的解决方案。现在我使用自己的脚本使用循环和 fgetl ,这对于大文件来说是非常慢的。任何建议?

Second part of the question: I've tried importdata, csvread, and dlmread, but I'm not sure which is best, or if there's a better solution. Right now I'm using my own script using a loop and fgetl, which is horribly slow for large files. Any suggestions?

function [data,headers]=csvreader(filename); %V1_1
 fid=fopen(filename,'r');
 data={};
 headers={};
 count=1;
 while 1
      textline=fgetl(fid);
      if ~ischar(textline),   break,   end
      nextchar=textline(1);
      idx=1;
      while nextchar~=','
        headers{count}(idx)=textline(1);
        idx=idx+1;
        textline(1)=[];
        nextchar=textline(1);
      end
      textline(1)=[];
      data{count}=str2num(textline);
      count=count+1;
 end
 fclose(fid);

(我知道这可能是写得很糟糕的代码 - 我是工程师,不是程序员不要对我说 - 虽然,欢迎任何改进的建议。)

(I know this is probably terribly written code - I'm an engineer, not a programmer, please don't yell at me - any suggestions for improvement would be welcome, though.)

推荐答案

这可能会使数据更容易阅读是否可以使用 NaN 值:

It would probably make the data easier to read if you could pad the file with NaN values when your first script creates it:

Item1,1,2,3,NaN
Item2,4,5,6,7
Item3,8,9,NaN,NaN

或者您甚至可以打印空字段:

or you could even just print empty fields:

Item1,1,2,3,
Item2,4,5,6,7
Item3,8,9,,

当然,为了正确填充,你需要知道所有项目的最大值的数量是在手之前。使用以上格式之一,您可以使用标准文件读取函数,例如 TEXTSCAN 例如:

Of course, in order to pad properly you would need to know what the maximum number of values across all the items is before hand. With either format above, you could then use one of the standard file reading functions, like TEXTSCAN for example:

>> fid = fopen('uneven_data.txt','rt');
>> C = textscan(fid,'%s %f %f %f %f','Delimiter',',','CollectOutput',1);
>> fclose(fid);
>> C{1}

ans = 

    'Item1'
    'Item2'
    'Item3'

>> C{2}

ans =

     1     2     3   NaN  %# TEXTSCAN sets empty fields to NaN anyway
     4     5     6     7
     8     9   NaN   NaN

这篇关于在MATLAB中导入CSV文件的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆