通过使用Python排序的文本文件 [英] Sorting text file by using Python
问题描述
我有一个文本文件中包含了超过10万线。这样的行:
I have a text file includes over than 10 million lines. Lines like that:
37024469;196672001;255.0000000000
37024469;196665001;396.0000000000
37024469;196664001;396.0000000000
37024469;196399002;85.0000000000
37024469;160507001;264.0000000000
37024469;160506001;264.0000000000
正如你看到的,分隔符是;。我想排序按照第二个元素使用python这个文本文件。我不能使用分割功能。因为它会导致的MemoryError。我该如何管理呢?
As you seen, delimiter is ";". i would like to sort this text file by using python according to the second element. I couldnt use split function. Because it causes MemoryError. how can i manage it ?
推荐答案
不要在内存中的排序10万款。拆分这件事分批来代替:
Don't sort 10 million lines in memory. Split this up in batches instead:
-
运行100 10万行排序(使用文件作为一个迭代器,并结合
islice()
或类似选择一个批次)。写出到别处单独的文件。
Run 100 100k line sorts (using the file as an iterator, combined with
islice()
or similar to pick a batch). Write out to separate files elsewhere.
合并排序的文件。这里是你可以通过100打开的文件,它会产生在有序线的合并发生。写一行一个新的文件中的行:
Merge the sorted files. Here is an merge generator that you can pass 100 open files and it'll yield lines in sorted order. Write to a new file line by line:
import operator
def mergeiter(*iterables, **kwargs):
"""Given a set of sorted iterables, yield the next value in merged order
Takes an optional `key` callable to compare values by.
"""
iterables = [iter(it) for it in iterables]
iterables = {i: [next(it), i, it] for i, it in enumerate(iterables)}
if 'key' not in kwargs:
key = operator.itemgetter(0)
else:
key = lambda item, key=kwargs['key']: key(item[0])
while True:
value, i, it = min(iterables.values(), key=key)
yield value
try:
iterables[i][0] = next(it)
except StopIteration:
del iterables[i]
if not iterables:
raise
这篇关于通过使用Python排序的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!