当用 Python 处理一个巨大的 CSV 突然停止时,“杀死"是什么意思? [英] What does 'killed' mean when a processing of a huge CSV with Python, which suddenly stops?

查看:23
本文介绍了当用 Python 处理一个巨大的 CSV 突然停止时,“杀死"是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Python 脚本,可以导入一个大型 CSV 文件,然后计算文件中每个单词出现的次数,然后将计数导出到另一个 CSV 文件.

I have a Python script that imports a large CSV file and then counts the number of occurrences of each word in the file, then exports the counts to another CSV file.

但是发生的情况是,一旦计数部分完成并开始导出,它会在终端中显示 Killed.

But what is happening is that once that counting part is finished and the exporting begins it says Killed in the terminal.

我不认为这是内存问题(如果是我假设我会收到内存错误而不是 Killed).

I don't think this is a memory problem (if it was I assume I would be getting a memory error and not Killed).

会不会是这个过程花费的时间太长了?如果是这样,有没有办法延长超时时间,这样我就可以避免这种情况?

Could it be that the process is taking too long? If so, is there a way to extend the time-out period so I can avoid this?

代码如下:

csv.field_size_limit(sys.maxsize)
    counter={}
    with open("/home/alex/Documents/version2/cooccur_list.csv",'rb') as file_name:
        reader=csv.reader(file_name)
        for row in reader:
            if len(row)>1:
                pair=row[0]+' '+row[1]
                if pair in counter:
                    counter[pair]+=1
                else:
                    counter[pair]=1
    print 'finished counting'
    writer = csv.writer(open('/home/alex/Documents/version2/dict.csv', 'wb'))
    for key, value in counter.items():
        writer.writerow([key, value])

Killed 发生在 finished counting 打印后,完整的信息是:

And the Killed happens after finished counting has printed, and the full message is:

killed (program exited with code: 137)

推荐答案

Exit code 137 (128+9) 表示你的程序因为接收到信号 9 而退出,也就是 SIGKILL.这也解释了 killed 消息.问题是,你为什么会收到那个信号?

Exit code 137 (128+9) indicates that your program exited due to receiving signal 9, which is SIGKILL. This also explains the killed message. The question is, why did you receive that signal?

最可能的原因可能是您的进程超出了允许使用的系统资源量的某个限制.根据您的操作系统和配置,这可能意味着您打开了太多文件、使用了太多文件系统空间或其他东西.最有可能的是您的程序使用了太多内存.当内存分配开始失败时,系统不会冒险让事情崩溃,而是向使用过多内存的进程发送终止信号.

The most likely reason is probably that your process crossed some limit in the amount of system resources that you are allowed to use. Depending on your OS and configuration, this could mean you had too many open files, used too much filesytem space or something else. The most likely is that your program was using too much memory. Rather than risking things breaking when memory allocations started failing, the system sent a kill signal to the process that was using too much memory.

正如我之前评论过的,在打印 finished counts 后您可能会遇到内存限制的一个原因是您在最终循环中对 counter.items() 的调用分配了一个包含字典中所有键和值的列表.如果你的字典有很多数据,这可能是一个非常大的列表.一个可能的解决方案是使用 counter.iteritems() 这是一个生成器.它不是返回列表中的所有项目,而是让您以更少的内存使用量迭代它们.

As I commented earlier, one reason you might hit a memory limit after printing finished counting is that your call to counter.items() in your final loop allocates a list that contains all the keys and values from your dictionary. If your dictionary had a lot of data, this might be a very big list. A possible solution would be to use counter.iteritems() which is a generator. Rather than returning all the items in a list, it lets you iterate over them with much less memory usage.

所以,我建议你试试这个,作为你的最后一个循环:

So, I'd suggest trying this, as your final loop:

for key, value in counter.iteritems():
    writer.writerow([key, value])

请注意,在 Python 3 中,items 返回一个字典视图"对象,它的开销与 Python 2 的版本不同.它取代了 iteritems,所以如果你以后升级 Python 版本,你最终会将循环改回原来的样子.

Note that in Python 3, items returns a "dictionary view" object which does not have the same overhead as Python 2's version. It replaces iteritems, so if you later upgrade Python versions, you'll end up changing the loop back to the way it was.

这篇关于当用 Python 处理一个巨大的 CSV 突然停止时,“杀死"是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆