提取大型Matlab数据集子集 [英] Extract large Matlab dataset subsets

查看：303 发布时间：2020/5/6 15:05:55 matlab dataset subset

本文介绍了提取大型Matlab数据集子集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

引用和分配matlab数据集的子集似乎效率极低，并且可能像行^ 2那样缩放

Referencing and assigning a subset of a matlab dataset appears to be extremely inefficient and possibly scales like rows^2

示例:

alldata是一个包含混合数据的大型数据集-例如150,000行乘25列(整数，布尔值和字符串).

alldata is a large dataset of mixed data - say 150,000 rows by 25 columns (integer, boolean and string).

数据集的格式为:

'format', '%s%u%u%u%u%u%s%s%s%s%s%s%s%u%u%u%u%s%u%s%s%u%s%s%s%s%u%s%u%s%s%s%u%s'

然后我将2个类型的整数cols转换为布尔型

I then convert 2 type integer cols into type boolean

以下子集分配:

somedata = alldata(1:m,:)

对于m = 10,000，

花费> 7秒，对于更大的m值，花费的时间荒谬.绘制时间vs m显示了m ^ 2类型关系，这很奇怪，因为复制所有数据几乎是瞬时的，就像使用sortrow和find这样的函数一样.实际上，对于较大的m值，读取原始.csv数据文件的速度比上述分配要快.

takes >7 sec for m = 10,000 and ridiculous amounts of time for larger values of m. Plotting time vs m shows a m^2 type relationship which is strange, given that copying alldata is nearly instantaneous, as is using functions like sortrows and find. In fact reading the original .csv data file in is faster than the above assignment for large values of m.

使用事件探查器，似乎有一个功能子引用，其中包括一条非常慢的行，该行检查字符串比较以确定数据集中的唯一值.这与数据集类型的存储方式(即参考表)有关吗?该数据集包含大量唯一的字符串值.

Using the profiler, it appears there is a function subref that includes a very slow line that checks for string comparisons to determine unique values within the dataset. Is this related to how the dataset type is stored (i.e. a reference table)? The dataset includes large number of unique string values.

在matlab中提取数据集的子集是否有解决方案?例如预分配(如何?)，或者复制数据集并删除行，而不是分配提取/子集.

Are their any solutions to extracting a subset of a dataset in matlab? Such as preallocation (how?), or copying the dataset and deleting rows rather than assigning an extract/subset.

我正在使用具有1.5Gb内存的双核计算机，但是任务管理器报告正在使用的ram少于1Gb.

I am using a dual core machine with 1.5Gb ram, but task manager reports less than 1Gb of ram in use.

提取大型Matlab数据集子集 [英] Extract large Matlab dataset subsets

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

提取大型Matlab数据集子集 [英] Extract large Matlab dataset subsets

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭