如何解决Python中的内存错误 [英] How to solve the memory error in Python

查看:308
本文介绍了如何解决Python中的内存错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理几个大的txt文件,每个文件大约有8000000行.这些行的简短示例是:

usedfor zipper fasten_coat
usedfor zipper fasten_jacket
usedfor zipper fasten_pant
usedfor your_foot walk
atlocation camera cupboard
atlocation camera drawer
atlocation camera house
relatedto more plenty

将它们存储在字典中的代码是:

dicCSK = collections.defaultdict(list)
for line in finCSK:
    line=line.strip('\n')
    try:
        r, c1, c2 = line.split(" ")
    except ValueError:
        print line
    dicCSK[c1].append(r+" "+c2)

在第一个txt文件中运行良好,但是在第二个txt文件中运行时,出现错误MemoryError.

我正在使用Windows 7 64bit和python 2.7 32bit,intel i5 cpu,8Gb内存.我该如何解决这个问题?

进一步说明: 我有四个大文件,每个文件包含许多实体的不同信息.例如,我要查找cat,其父节点animal和其子节点persian cat等的所有信息.因此,我的程序首先读取字典中的所有txt文件,然后扫描所有字典以查找cat及其父级和子级的信息.

解决方案

最简单的解决方案:您可能已经用完了虚拟地址空间(任何其他形式的错误通常意味着运行很长时间,直到最终获得一个虚拟地址). MemoryError).这是因为Windows(和大多数操作系统)上的32位应用程序限制为2 GB的用户模式地址空间(可以将Windows调整为3 GB,但这仍然是一个较低的上限).您有8 GB的RAM,但您的程序无法使用(至少)其中的3/4. Python的每对象开销非常大(对象标头,分配对齐等),赔率是单独的字符串使用的内存接近GB,这是在处理字典开销之前,其余的您的程序,Python的其余部分,等等.如果内存空间足够碎片,并且字典需要增长,则它可能没有足够的连续空间来重新分配,您将得到一个MemoryError.

安装64位版本的Python(如果可以,出于其他原因,我建议升级到Python 3);它会使用更多的内存,但随后,它可以访问 lot 更多的内存空间(以及更多的物理RAM).

如果这还不够,请考虑转换为sqlite3数据库(或其他数据库),这样当数据对于主内存来说太大时,它自然会溢出到磁盘上,同时仍然具有相当高效的查找功能.

I am dealing with several large txt file, each of them has about 8000000 lines. A short example of the lines are:

usedfor zipper fasten_coat
usedfor zipper fasten_jacket
usedfor zipper fasten_pant
usedfor your_foot walk
atlocation camera cupboard
atlocation camera drawer
atlocation camera house
relatedto more plenty

The code to store them in a dictionary is:

dicCSK = collections.defaultdict(list)
for line in finCSK:
    line=line.strip('\n')
    try:
        r, c1, c2 = line.split(" ")
    except ValueError:
        print line
    dicCSK[c1].append(r+" "+c2)

It runs good in the first txt file, but when it runs to the second txt file, I got an error MemoryError.

I am using window 7 64bit with python 2.7 32bit, intel i5 cpu, with 8Gb memory. How can I solve the problem?

Further explaining: I have four large files, each file contains different information for many entities. For example, I want to find all information for cat, its father node animal and its child node persian cat and so on. So my program first read all txt files in the dictionary, then I scan all dictionaries to find information for cat and its father and its children.

解决方案

Simplest solution: You're probably running out of virtual address space (any other form of error usually means running really slowly for a long time before you finally get a MemoryError). This is because a 32 bit application on Windows (and most OSes) is limited to 2 GB of user mode address space (Windows can be tweaked to make it 3 GB, but that's still a low cap). You've got 8 GB of RAM, but your program can't use (at least) 3/4 of it. Python has a fair amount of per-object overhead (object header, allocation alignment, etc.), odds are the strings alone are using close to a GB of RAM, and that's before you deal with the overhead of the dictionary, the rest of your program, the rest of Python, etc. If memory space fragments enough, and the dictionary needs to grow, it may not have enough contiguous space to reallocate, and you'll get a MemoryError.

Install a 64 bit version of Python (if you can, I'd recommend upgrading to Python 3 for other reasons); it will use more memory, but then, it will have access to a lot more memory space (and more physical RAM as well).

If that's not enough, consider converting to a sqlite3 database (or some other DB), so it naturally spills to disk when the data gets too large for main memory, while still having fairly efficient lookup.

这篇关于如何解决Python中的内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆