Matlab 表/数据集类型优化 [英] Matlab Table / Dataset type optimization

查看:22
本文介绍了Matlab 表/数据集类型优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Matlab 中为观察变量"表搜索一些优化的数据类型,可以通过列(通过变量)和行(通过观察)快速轻松地访问这些数据类型.

I am searching some optimized datatypes for "observations-variables" table in Matlab, that can be fast and easily accessed by columns (through variables) and by rows (through observations).

这是现有 Matlab 数据类型的比较:

Here is сomparison of existing Matlab datatypes:

  1. Matrix 非常快,但是,它没有内置的维度索引标签/枚举,而且您不能总是按列索引记住变量名称.
  2. 表格的性能很差,尤其是在 for 循环中读取单个行/列时(我想它运行一些缓慢的转换方法,并且设计得更像 Excel).李>
  3. 标量结构(列数组的结构)数据类型 - 以列方式快速访问作为向量的变量,但缓慢地以行方式转换为观察值.
  4. 非标量结构(结构数组) - 以行方式快速访问作为向量的观测值,但以列方式缓慢转换为变量.
  1. Matrix is very fast, hovewer, it has no built-in indexing labels/enumerations for its dimensions, and you can't always remember variable name by column index.
  2. Table has very bad performance, especially when reading individual rows/columns in a for loop (I suppose it runs some slow convertion methods, and is designed to be more Excel-like).
  3. Scalar structure (structure of column arrays) datatype - fast column-wise access to variables as vectors, but slow row-wise conversion to observations.
  4. Nonscalar structure (array of structures) - fast row-wise access to observations as vectors, but slow column-wise conversion to variables.

我想知道是否可以使用一些更简单和优化的 Table 数据类型版本,如果我只想将行号和列变量索引与仅数字变量 - 或 - 任何变量类型结合起来.

I wonder if I can use some simpler and optimized version of Table data type, if I want just to combine row-number and column-variable indexing with only numerical variables -OR- any variable type.

测试脚本结果:

----
TEST1 - reading individual observations
Matrix: 0.072519 sec
Table: 18.014 sec
Array of structures: 0.49896 sec
Structure of arrays: 4.3865 sec
----
TEST2 - reading individual variables
Matrix: 0.0047834 sec
Table: 0.0017972 sec
Array of structures: 2.2715 sec
Structure of arrays: 0.0010529 sec

测试脚本:

Nobs = 1e5; % number of observations-rows
varNames = {'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O'};
Nvar = numel(varNames); % number of variables-colums

M = randn(Nobs, Nvar); % matrix

T = array2table(M, 'VariableNames', varNames); % table

NS = struct; % nonscalar structure = array of structures
for i=1:Nobs
    for v=1:Nvar
        NS(i).(varNames{v}) = M(i,v);
    end
end

SS = struct; % scalar structure = structure of arrays
for v=1:Nvar
    SS.(varNames{v}) = M(:,v);
end

%% TEST 1 - reading individual observations (row-wise)
disp('----'); disp('TEST1 - reading individual observations');

tic; % matrix
for i=1:Nobs
   x = M(i,:); end
disp(['Matrix: ', num2str(toc()), ' sec']);

tic; % table
for i=1:Nobs
   x = T(i,:); end
disp(['Table: ', num2str(toc), ' sec']);

tic;% nonscalar structure = array of structures
for i=1:Nobs
    x = NS(i); end
disp(['Array of structures: ', num2str(toc()), ' sec']);

tic;% scalar structure = structure of arrays 
for i=1:Nobs
    for v=1:Nvar
        x.(varNames{v}) = SS.(varNames{v})(i);
    end
end
disp(['Structure of arrays: ', num2str(toc()), ' sec']);

%% TEST 2 - reading individual variables (column-wise)
disp('----'); disp('TEST2 - reading individual variables');

tic; % matrix
for v=1:Nvar
   x = M(:,v); end
disp(['Matrix: ', num2str(toc()), ' sec']);

tic; % table
for v=1:Nvar
   x = T.(varNames{v}); end
disp(['Table: ', num2str(toc()), ' sec']);

tic; % nonscalar structure = array of structures
for v=1:Nvar
    for i=1:Nobs
        x(i,1) = NS(i).(varNames{v});
    end
end
disp(['Array of structures: ', num2str(toc()), ' sec']);

tic; % scalar structure = structure of arrays
for v=1:Nvar
    x = SS.(varNames{v}); end
disp(['Structure of arrays: ', num2str(toc()), ' sec']);

推荐答案

我会使用矩阵,因为它们是最快且最简单易用的,然后创建一组枚举列标签以使索引列更容易.这里有几种方法可以做到这一点:

I would use matrices, since they're the fastest and most straightforward to use, and then create a set of enumerated column labels to make indexing columns easier. Here are a few ways to do this:


给定您的变量名称,并假设它们按从第 1 列到 N 的顺序映射,您可以像这样创建映射:

Given your variable names, and assuming they map in order from columns 1 through N, you can create a mapping like so:

varNames = {'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O'};
col = containers.Map(varNames, 1:numel(varNames));

现在您可以使用地图通过变量名称访问数据列.例如,如果您想从矩阵 data 中获取变量 AC 的列(即第一个和第三个),您可以这样做:

And now you can use the map to access columns of your data by variable name. For example, if you want to fetch the columns for variables A and C (i.e. the first and third) from a matrix data, you would do this:

subData = data(:, [col('A') col('C')]);


您可以创建一个结构,将变量名称作为其字段,并将相应的列索引作为它们的值,如下所示:

You can create a structure with the variable names as its fields and the corresponding column indices as their values like so:

enumData = [varNames; num2cell(1:numel(varNames))];
col = struct(enumData{:});

这就是 col 包含的内容:

And here's what col contains:

struct with fields:

  A: 1
  B: 2
  C: 3
  D: 4
  E: 5
  F: 6
  G: 7
  H: 8
  I: 9
  J: 10
  K: 11
  L: 12
  M: 13
  N: 14
  O: 15

您可以像这样访问列 AC:

And you would access columns A and C like so:

subData = data(:, [col.A col.C]);
% ...or with dynamic field names...
subData = data(:, [col.('A') col.('C')]);


可以在工作区中为每个列名创建一个变量,并将列索引存储在其中.这将用更多变量污染您的工作区,但为您提供了一种访问列数据的简洁方式.这是一个简单的方法,使用备受诟病的 eval:

You could just create a variable in your workspace for every column name and store the column indices in them. This will pollute your workspace with more variables, but gives you a terse way to access column data. Here's an easy way to do it, using the much-maligned eval:

enumData = [varNames; num2cell(1:numel(varNames))];
eval(sprintf('%s=%d;', enumData{:}));

访问 AC 列非常简单:

And accessing columns A and C is as easy as:

subData = data(:, [A C]);


这可能有点过头了,但是如果您要使用相同的列标签和索引映射进行许多分析,您可以创建一个枚举类,将其保存在您的某个位置MATLAB 路径,从不不得不担心再次定义您的列枚举.例如,这是一个具有 15 个枚举值的 ColVar 类:

This is probably a good dose of overkill, but if you're going to use the same mapping of column labels and indices for many analyses you could create an enumeration class, save it somewhere on your MATLAB path, and never have to worry about defining your column enumerations again. For example, here's a ColVar class with 15 enumerated values:

classdef ColVar < double
  enumeration
    A (1)
    B (2)
    C (3)
    D (4)
    E (5)
    F (6)
    G (7)
    H (8)
    I (9)
    J (10)
    K (11)
    L (12)
    M (13)
    N (14)
    O (15)
  end
end

您可以像这样访问列 AC:

And you would access columns A and C like so:

subData = data(:, [ColVar.A ColVar.C]);

这篇关于Matlab 表/数据集类型优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆