通过磁盘磁盘文件的大型数据阵列(操作) [英] Large data array (operations) via disk disk files

查看:76
本文介绍了通过磁盘磁盘文件的大型数据阵列(操作)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我需要操纵一个大矩阵,比如A(N,N)(真实的)8GB

哪个可以不适合物理内存(2 BG)。但是,b

的性质要求仅对部分数据进行操作,例如: 500 MB(0.5

GB)

一次。


程序如下:


1.生成数据并将数据存储在数组A(N,N)中,N为巨大。


2.在A循环中进行计算,例如


for i = 1,N

j = 1,N

用A计算东西(一部分)

结束

结束


如何实施程序以容纳大量内存?
需要?


谢谢,

Zin

Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can''t fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?

Thanks,
Zin

推荐答案

ge ****** @ gmail.com 写道:




我需要操纵一个大矩阵,比如A(N,N)(真实的)8GB

这是不合适的在物理内存(2 BG)。但是,b

的性质要求仅对部分数据进行操作,例如: 500 MB(0.5

GB)

一次。


程序如下:


1.生成数据并将数据存储在数组A(N,N)中,N为巨大。


2.在A循环中进行计算,例如


for i = 1,N

j = 1,N

用A计算东西(一部分)

结束

结束


如何实施程序以容纳大量内存?
需要?
Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can''t fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?



两种解决方案:


如果性能有问题,请使用内存足够的盒子。


否则使用操作

环境提供的内存映射文件支持,并将矩阵的所需部分映射到内存中。你这么做是多么的b / b $ OS $具有操作系统特性,最好是在操作系统组中询问。


-

Ian Collins。

Two solutions:

If performance is an issue, use a box with enough memory.

Otherwise use the memory mapped file support provided by your operating
environment and map the required portion of the matrix into memory. How
you do this will be OS specific and best asked on an OS group.

--
Ian Collins.



ge ****** @ gmail.com 写道:




我需要操作一个大矩阵,比如A(N ,N)(实际)8GB

不适合物理内存(2 BG)。但是,b

的性质要求仅对部分数据进行操作,例如: 500 MB(0.5

GB)

一次。


程序如下:


1.生成数据并将数据存储在数组A(N,N)中,N为巨大。


2.在A循环中进行计算,例如


for i = 1,N

j = 1,N

用A计算东西(一部分)

结束

结束


如何实施程序以容纳大量内存?
需要?


谢谢,

Zin
Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can''t fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?

Thanks,
Zin



两种可能性

1)如果数据有很多缺失的元素或推断的常量

(如零)作为元素,那么你可以使用稀疏矩阵处理,

你有一个标记的链接到每个下一个元素。标记元素

表示您通过以下方式识别值: -

前一个和最后一个元素坐标

或仅元素'的行和列号。

这需要每个单元格的单元格值和单元格坐标(三个

值)。

有时候你可以只使用列号和按行处理

,因此只需要注意列索引1时的情况。出现在下一个

计数行。

这个问题经常受到一般链接处理的攻击

例程。

如果所需元素占据理论最大值(N * N)的三分之一(或者说是b的一半),这些方法将占用更少的内存空间$ b线性情况)但是如果分数很低,只有五分之一或更低,那将只会非常有用。


2)重做算法你希望使用它,因此它在内存中需要的内存少于可用内存的元素,因为操作要等于
继续。

如果这不起作用,那么使用虚拟内存将磁盘视为按行处理随机访问文件,每个记录中包含所有行。并尝试

使用按行内列进行处理的algorthm。

Two possibilities
1) if the data has a lot of missing elements or inferred constants
(like zero) as elements, then you could use sparse matrix processing,
where you have a marked link to each next element. Marking the element
means you identify the value with :-
either the previous and last element coordinates
or just the element''s row and column number.
This needs the value of the cell and the coordinates of the cell (three
values) per cell.
Sometimes you can get by with just the column number and process by row
and so only need to note when a column index "1" appears for the next
counted row .
This problem is often attacked with generised linked list processing
routines.

These methods will use less memory space if the needed elements occupy
less than one third of the theoretical maximum (N*N), (or one half in
the linear case) but only will be really useful if the prortion is far
less, like one fifth or lower.

2) rework the algorithm you wish to use, so that it needs less elements
in memory at one time than the available memory, for the operation to
proceed.

If that doesn''t work then use virtual memory by treating the disk as a
random access file by row, with all of a row in each "record" and try
to use an algorthm that processes on a column-within-row basis.


2。在A in循环中进行计算,例如
2. Do computation on A in loops, e.g.

>

表示i = 1,N

表示j = 1,N

用A计算东西(一部分)

结束

结束


如何实施程序以适应大记忆

需要什么?
>
for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?



我怀疑你需要查找关键字阻止,或许与

一起查找单词array。或矩阵。这确实需要打破你上面的双循环操作的天真

序列,并且根据你的计算某事的性质,可能需要一些仔细的思考得到这个

重新排序序列做正确的事情。


Jan

I suspect you need to look up the keyword "blocking", perhaps together with
the words "array" or "matrix". That does require breaking up the naive
sequence of operations of your double loop above, and depending on the nature
of your "compute something", might require some careful thinking to get this
reordered sequence do the correct thing.

Jan


这篇关于通过磁盘磁盘文件的大型数据阵列(操作)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆