导入具有混合数据类型的CSV文件 [英] Import CSV file with mixed data types

查看:234
本文介绍了导入具有混合数据类型的CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用MATLAB几天,我无法将CSV文件导入矩阵。

I'm working with MATLAB for few days and I'm having difficulties to import a CSV-file to a matrix.

我的问题是,我的CSV-文件几乎只包含字符串和一些整数值,因此 csvread()不起作用。 csvread()只有整数值。

My problem is that my CSV-file contains almost only Strings and some integer values, so that csvread() doesn't work. csvread() only gets along with integer values.

如何将我的字符串存储在某种2每一个元素可以自由访问每一个元素?

How can I store my strings in some kind of a 2-dimensional array to have free access to each element?

下面是我需要的示例CSV:

Here's a sample CSV for my needs:

04;abc;def;ghj;klm;;;;;
;;;;;Test;text;0xFF;;
;;;;;asdfhsdf;dsafdsag;0x0F0F;;

主要的是空单元格和单元格内的文本。
如您所见,结构可能会有所不同。

The main thing are the empty cells and the texts within the cells. As you see, the structure may vary.

推荐答案

一个示例输入文件...

对于您知道CSV文件中将有多少列数据的情况, a href =http://www.mathworks.com/help/techdoc/ref/textscan.html =noreferrer> TEXTSCAN 喜欢 Amro建议将是您最佳的解决方案。

For the case when you know how many columns of data there will be in your CSV file, one simple call to TEXTSCAN like Amro suggests will be your best solution.

但是,如果您不知道先前的文件中有多少列,您可以使用更一般的方法,如下面的函数中所做的那样。我首先使用函数 FGETL 将文件的每一行读入单元阵列。然后,我使用函数 TEXTSCAN 将每行解析为单独的字符串,使用预定义字段分隔符,并将整数字段作为字符串处理(以后可以将其转换为数字值)。这是生成的代码,放置在函数 read_mixed_csv

However, if you don't know a priori how many columns are in your file, you can use a more general approach like I did in the following function. I first used the function FGETL to read each line of the file into a cell array. Then I used the function TEXTSCAN to parse each line into separate strings using a predefined field delimiter and treating the integer fields as strings for now (they can be converted to numeric values later). Here is the resulting code, placed in a function read_mixed_csv:

function lineArray = read_mixed_csv(fileName,delimiter)
  fid = fopen(fileName,'r');   %# Open the file
  lineArray = cell(100,1);     %# Preallocate a cell array (ideally slightly
                               %#   larger than is needed)
  lineIndex = 1;               %# Index of cell to place the next line in
  nextLine = fgetl(fid);       %# Read the first line from the file
  while ~isequal(nextLine,-1)         %# Loop while not at the end of the file
    lineArray{lineIndex} = nextLine;  %# Add the line to the cell array
    lineIndex = lineIndex+1;          %# Increment the line index
    nextLine = fgetl(fid);            %# Read the next line from the file
  end
  fclose(fid);                 %# Close the file
  lineArray = lineArray(1:lineIndex-1);  %# Remove empty cells, if needed
  for iLine = 1:lineIndex-1              %# Loop over lines
    lineData = textscan(lineArray{iLine},'%s',...  %# Read strings
                        'Delimiter',delimiter);
    lineData = lineData{1};              %# Remove cell encapsulation
    if strcmp(lineArray{iLine}(end),delimiter)  %# Account for when the line
      lineData{end+1} = '';                     %#   ends with a delimiter
    end
    lineArray(iLine,1:numel(lineData)) = lineData;  %# Overwrite line data
  end
end

在示例上运行此函数来自问题的文件内容给出此结果:

Running this function on the sample file content from the question gives this result:

>> data = read_mixed_csv('myfile.csv',';')

data = 

  Columns 1 through 7

    '04'    'abc'    'def'    'ghj'    'klm'    ''            ''        
    ''      ''       ''       ''       ''       'Test'        'text'    
    ''      ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'

  Columns 8 through 10

    ''          ''    ''
    '0xFF'      ''    ''
    '0x0F0F'    ''    ''

结果是一个3乘10单元格数组,每个单元格有一个字段,由空字符串''表示。现在您可以访问每个单元格或单元格的组合,以格式化他们,你喜欢。例如,如果要将第一列中的字段从字符串更改为整数值,则可以使用函数 STR2DOUBLE ,如下所示:

The result is a 3-by-10 cell array with one field per cell where missing fields are represented by the empty string ''. Now you can access each cell or a combination of cells to format them as you like. For example, if you wanted to change the fields in the first column from strings to integer values, you could use the function STR2DOUBLE as follows:

>> data(:,1) = cellfun(@(s) {str2double(s)},data(:,1))

data = 

  Columns 1 through 7

    [  4]    'abc'    'def'    'ghj'    'klm'    ''            ''        
    [NaN]    ''       ''       ''       ''       'Test'        'text'    
    [NaN]    ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'

  Columns 8 through 10

    ''          ''    ''
    '0xFF'      ''    ''
    '0x0F0F'    ''    ''

请注意,会导致 NaN 值。

这篇关于导入具有混合数据类型的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆