导入混合数据类型的 CSV 文件 [英] Import CSV file with mixed data types

查看:19
本文介绍了导入混合数据类型的 CSV 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 MATLAB 工作了几天,但在将 CSV 文件导入矩阵时遇到了困难.

I'm working with MATLAB for few days and I'm having difficulties to import a CSV-file to a matrix.

我的问题是我的 CSV 文件几乎只包含字符串和一些整数值,所以 csvread() 不起作用.csvread() 只能处理整数值.

My problem is that my CSV-file contains almost only Strings and some integer values, so that csvread() doesn't work. csvread() only gets along with integer values.

如何将我的字符串存储在某种二维数组中以自由访问每个元素?

How can I store my strings in some kind of a 2-dimensional array to have free access to each element?

这是满足我需要的示例 CSV:

Here's a sample CSV for my needs:

04;abc;def;ghj;klm;;;;;
;;;;;Test;text;0xFF;;
;;;;;asdfhsdf;dsafdsag;0x0F0F;;

主要是空单元格和单元格内的文本.如您所见,结构可能会有所不同.

The main thing are the empty cells and the texts within the cells. As you see, the structure may vary.

推荐答案

如果您知道 CSV 文件中有多少列数据,只需简单调用 textscan 就像 Amro 建议将是您的最佳解决方案.

For the case when you know how many columns of data there will be in your CSV file, one simple call to textscan like Amro suggests will be your best solution.

但是,如果您不先验知道您的文件中有多少列,您可以使用更通用的方法,就像我在以下函数中所做的那样.我首先使用函数 fgetl将文件的每一行读入一个元胞数组.然后我使用函数 textscan使用预定义的字段分隔符将每一行解析为单独的字符串,并暂时将整数字段视为字符串(稍后可以将它们转换为数值).这是结果代码,放在函数 read_mixed_csv 中:

However, if you don't know a priori how many columns are in your file, you can use a more general approach like I did in the following function. I first used the function fgetl to read each line of the file into a cell array. Then I used the function textscan to parse each line into separate strings using a predefined field delimiter and treating the integer fields as strings for now (they can be converted to numeric values later). Here is the resulting code, placed in a function read_mixed_csv:

function lineArray = read_mixed_csv(fileName, delimiter)

  fid = fopen(fileName, 'r');         % Open the file
  lineArray = cell(100, 1);           % Preallocate a cell array (ideally slightly
                                      %   larger than is needed)
  lineIndex = 1;                      % Index of cell to place the next line in
  nextLine = fgetl(fid);              % Read the first line from the file
  while ~isequal(nextLine, -1)        % Loop while not at the end of the file
    lineArray{lineIndex} = nextLine;  % Add the line to the cell array
    lineIndex = lineIndex+1;          % Increment the line index
    nextLine = fgetl(fid);            % Read the next line from the file
  end
  fclose(fid);                        % Close the file

  lineArray = lineArray(1:lineIndex-1);              % Remove empty cells, if needed
  for iLine = 1:lineIndex-1                          % Loop over lines
    lineData = textscan(lineArray{iLine}, '%s', ...  % Read strings
                        'Delimiter', delimiter);
    lineData = lineData{1};                          % Remove cell encapsulation
    if strcmp(lineArray{iLine}(end), delimiter)      % Account for when the line
      lineData{end+1} = '';                          %   ends with a delimiter
    end
    lineArray(iLine, 1:numel(lineData)) = lineData;  % Overwrite line data
  end

end

对问题中的示例文件内容运行此函数会得到以下结果:

Running this function on the sample file content from the question gives this result:

>> data = read_mixed_csv('myfile.csv', ';')

data = 

  Columns 1 through 7

    '04'    'abc'    'def'    'ghj'    'klm'    ''            ''        
    ''      ''       ''       ''       ''       'Test'        'text'    
    ''      ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'

  Columns 8 through 10

    ''          ''    ''
    '0xFF'      ''    ''
    '0x0F0F'    ''    ''

结果是一个 3×10 元胞数组,每个元胞一个字段,其中缺失的字段由空字符串 '' 表示.现在您可以访问每个单元格或单元格组合以根据需要设置它们的格式.例如,如果您想将第一列中的字段从字符串更改为整数值,您可以使用函数 str2double 如下:

The result is a 3-by-10 cell array with one field per cell where missing fields are represented by the empty string ''. Now you can access each cell or a combination of cells to format them as you like. For example, if you wanted to change the fields in the first column from strings to integer values, you could use the function str2double as follows:

>> data(:, 1) = cellfun(@(s) {str2double(s)}, data(:, 1))

data = 

  Columns 1 through 7

    [  4]    'abc'    'def'    'ghj'    'klm'    ''            ''        
    [NaN]    ''       ''       ''       ''       'Test'        'text'    
    [NaN]    ''       ''       ''       ''       'asdfhsdf'    'dsafdsag'

  Columns 8 through 10

    ''          ''    ''
    '0xFF'      ''    ''
    '0x0F0F'    ''    ''

请注意,空字段导致 NaN 值.

Note that the empty fields results in NaN values.

这篇关于导入混合数据类型的 CSV 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆