在大型(> 2 Gig)文件上使用mmap [英] using mmap on large (> 2 Gig) files

查看:67
本文介绍了在大型(> 2 Gig)文件上使用mmap的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



有没有人这样做过?看起来Python2.4不会花费长度arg

Hi
Anyone ever done this? It looks like Python2.4 won''t take a length arg


2 Gig因为它不被视为int。
2 Gig since its not seen as an int.



Mathew

Mathew

推荐答案

我的***** @ jpl.nasa.gov schrieb:
my*****@jpl.nasa.gov schrieb:

任何人都做过这个?看起来Python2.4不会花费长度arg
Anyone ever done this? It looks like Python2.4 won''t take a length arg

> 2 Gig因为它不被视为int。
>2 Gig since its not seen as an int.



你在做什么架构?在32位架构上,无论如何都不可能在2GiB中映射
(因为它可能不适合

可用的地址空间) 。


在64位架构上,这是Python 2.4的一个已知限制:

你不能拥有超过2Gi项目的容器。在Python 2.5中删除了此限制

,因此我建议升级。请注意,由于缺少适当的硬件,代码很少进行测试,因此我建议您在使用之前先查看mmap代码

it(或者只是测试它并在发现它们时报告错误)。


问候,

Martin

What architecture are you on? On a 32-bit architecture, it''s likely
impossible to map in 2GiB, anyway (since it likely won''t fit into the
available address space).

On a 64-bit architecture, this is a known limitation of Python 2.4:
you can''t have containers with more than 2Gi items. This limitation
was removed in Python 2.5, so I recommend to upgrade. Notice that
the code has seen little testing, due to lack of proper hardware,
so I shall suggest that you review the mmap code first before using
it (or just test it out and report bugs as you find them).

Regards,
Martin


Martin v.L?wis写道:
Martin v. L?wis wrote:
我的* ****@jpl.nasa.gov schrieb:
my*****@jpl.nasa.gov schrieb:

>有人做过这个吗?看起来Python2.4不会花费长度arg
>Anyone ever done this? It looks like Python2.4 won''t take a length arg

>> 2 Gig因为它不被视为int。
>>2 Gig since its not seen as an int.



你在做什么架构?在32位架构上,无论如何都不可能在2GiB中映射
(因为它可能不适合

可用的地址空间) 。


在64位架构上,这是Python 2.4的一个已知限制:

你不能拥有超过2Gi项目的容器。在Python 2.5中删除了此限制

,因此我建议升级。注意

由于缺少适当的硬件,代码几乎没有测试,


What architecture are you on? On a 32-bit architecture, it''s likely
impossible to map in 2GiB, anyway (since it likely won''t fit into the
available address space).

On a 64-bit architecture, this is a known limitation of Python 2.4:
you can''t have containers with more than 2Gi items. This limitation
was removed in Python 2.5, so I recommend to upgrade. Notice that
the code has seen little testing, due to lack of proper hardware,



NumPy使用mmap对象我看到了一篇论文在SciPy 2006上使用了

Python 2.5 + mmap + numpy来做一些非常好的和相对快速的

操作非常大的数据集。


所以,马丁的非常有用的变化看到的测试比他可能知道的更多。


-Travis

NumPy uses the mmap object and I saw a paper at SciPy 2006 that used
Python 2.5 + mmap + numpy to do some pretty nice and relatively fast
manipulations of very large data sets.

So, the very useful changes by Martin have seen more testing than he is
probably aware of.

-Travis



my*****@jpl.nasa。 gov 写道:

my*****@jpl.nasa.gov wrote:

有没有人这样做过?看起来Python2.4不会花费长度arg
Anyone ever done this? It looks like Python2.4 won''t take a length arg

http://docs.python.org/lib/module-mmap.html

似乎Python确实需要一个长度参数,但是不是偏移量

参数(不同于Windows的'CreateFileMapping / MapViewOfFile和UNIX''

mmap),所以你总是从文件的开头映射。当然,如果你曾经使用过C中的内存映射文件,那么你很可能已经经历了从头到尾映射一个大文件的问题。

主要放缓。如果文件足够大,它甚至不适合你进程的32位内存空间内的
。因此,您必须使用偏移量和长度

参数来限制

映射文件的部分。


但问题仍然是Python的'mmap"有资格作为记忆

映射一点都不内存映射文件意味着文件被映射到进程地址空间中。因此,如果您访问某个地址

(使用C中的指针类型),您实际上将从该文件中读取或写入

。在Windows上,此机制甚至用于访问文件系统上不存在的文件

。例如。如果在文件句柄设置为INVALID_HANDLE_VALUE的情况下调用CreateFileMapping

,则创建一个由OS页面文件支持的文件

映射。也就是说,你实际上获得了一个

共享内存段,例如可用于进程间通信。

你会如何使用Python的mmap这样的东西?


我没看过来源,但是如果Python实际上我会感到惊讶的是,当调用mmap时,
将文件映射到过程映像中。我相信

Python根本就不是内存映射;相反,它只是在文件系统的

中打开一个文件,并使用fseek来移动。也就是说,你可以在Python的'内存映射文件对象'上使用

切片运算符。好像它是一个列表或一个字符串,但它不是真正的内存映射,它只是一个语法上的便利性。因此,您甚至需要手动

" flush"内存映射对象。如果你正在谈论一个真正的内存

映射文件,显然不需要刷新。


这可能意味着你的问题无关紧要。即使文件

太大而无法放入32位过程映像中,Python的内存

映射也不会受此影响,因为它不是内存在mmap时映射

文件被称为。

http://docs.python.org/lib/module-mmap.html

It seems that Python does take a length argument, but not an offset
argument (unlike the Windows'' CreateFileMapping/MapViewOfFile and UNIX''
mmap), so you always map from the beginning of the file. Of course if
you have ever worked with memory mapping files in C, you will probably
have experienced that mapping a large file from beginning to end is a
major slowdown. And if the file is big enough, it does not even fit
inside the 32 bit memory space of your process. Thus you have to limit
the portion of the file that is mapped, using the offset and the length
arguments.

But the question remains whether Python''s "mmap" qualifies as a "memory
mapping" at all. Memory mapping a file means that the file is "mapped"
into the process address space. So if you access a certain address
(using a pointer type in C), you will actually read from or write to
the file. On Windows, this mechanism is even used to access "files"
that does not live on the file system. E.g. if CreateFileMapping is
called with the file handle set to INVALID_HANDLE_VALUE, creates a file
mapping backed by the OS paging file. That is, you actually obtain a
shared memory segment e.g. usable for for inter-process communication.
How would you use Python''s mmap for something like this?

I haven''t looked at the source, but I''d be surprised if Python actually
maps the file into the process image when mmap is called. I believe
Python is not memory mapping at all; rather, it just opens a file in
the file system and uses fseek to move around. That is, you can use
slicing operators on Python''s "memory mapped file object" as if it were
a list or a string, but it''s not really memory mapping, it''s just a
syntactical convinience. Because of this, you even need to manually
"flush" the memory mapping object. If you were talking to a real memory
mapped file, flushing would obviously not be required.

This probably means that your problem is irrelevant. Even if the file
is too large to fit inside a 32 bit process image, Python''s memory
mapping would not be affected by this, as it is not memory mapping the
file when "mmap" is called.


这篇关于在大型(> 2 Gig)文件上使用mmap的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆