处理大量数据 [英] Processing huge amount of data

查看:122
本文介绍了处理大量数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我想问:可以用来代表包含要处理的数字的巨大矩阵的最佳方法是什么?它太大了,无法一次在主内存中进行处理.

Hi, I want to ask: What is the best way that can be used to represent a huge matrix that contains digits to be processed? It is too big that it can be processed in the main memory at once.

推荐答案

这全都取决于您要对矩阵进行何种处理以及要处理哪种数据它拥有.

如果它是稀疏矩阵,则可以将其有效地表示为std::map<std::pair<unsigned, unsigned>, double>.

如果要顺序访问,则将数据存储在文件中,如果感觉闪存则使用fstreamistream_iterator.

如果您正在执行群集访问,请查看内存映射文件.这使您可以将文件的一部分视为内存,并将其映射到进程的地址空间.

如果您想使用比当前系统内存更大的工作集进行随机访问,那么无论您使用什么数据格式都将被填充,它将变得很慢.您也许可以使用某种形式的数据库样式哈希或索引来减轻这种情况,但这实际上取决于您的应用程序.

干杯,

It all depends on what you want to do with your matrix and what sort of data it holds.

If it''s a sparse matrix then you can represent it fairly efficiently as a std::map<std::pair<unsigned, unsigned>, double>.

If you want sequential access then store your data in a file and use an fstream or an istream_iterator if you''re feeling flash.

If you''re doing clustered access have a look at memory mapped files. This enables you to treat a chunk of a file as memory and map it into the address space of your process.

If you want random access with a working set bigger than your current system memory then whatever data format you use you''re going to be stuffed, it''s going to be slow. You might be able to use some form of database style hashing or indexing to alleviate this but it really depends on your application.

Cheers,

Ash


这篇关于处理大量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆