使用 Python 对文本文件进行排序 [英] Sorting text file by using Python

查看:48
本文介绍了使用 Python 对文本文件进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含超过 1000 万行的文本文件.像这样的行:

I have a text file includes over than 10 million lines. Lines like that:

37024469;196672001;255.0000000000
37024469;196665001;396.0000000000
37024469;196664001;396.0000000000
37024469;196399002;85.0000000000
37024469;160507001;264.0000000000
37024469;160506001;264.0000000000

如您所见,分隔符是;".我想根据第二个元素使用 python 对这个文本文件进行排序.我无法使用拆分功能.因为它会导致 MemoryError.我该如何管理?

As you seen, delimiter is ";". i would like to sort this text file by using python according to the second element. I couldnt use split function. Because it causes MemoryError. how can i manage it ?

推荐答案

不要对内存中的 1000 万行进行排序.改为分批拆分:

Don't sort 10 million lines in memory. Split this up in batches instead:

  • 运行 100 次 100k 行排序(使用文件作为迭代器,结合 islice() 或类似方法来选择一个批次).写出到别处的单独文件.

  • Run 100 100k line sorts (using the file as an iterator, combined with islice() or similar to pick a batch). Write out to separate files elsewhere.

合并排序后的文件.这是一个合并生成器,您可以传递 100 个打开的文件,它会按排序顺序生成行.逐行写入新文件:

Merge the sorted files. Here is an merge generator that you can pass 100 open files and it'll yield lines in sorted order. Write to a new file line by line:

import operator

def mergeiter(*iterables, **kwargs):
    """Given a set of sorted iterables, yield the next value in merged order

    Takes an optional `key` callable to compare values by.
    """
    iterables = [iter(it) for it in iterables]
    iterables = {i: [next(it), i, it] for i, it in enumerate(iterables)}
    if 'key' not in kwargs:
        key = operator.itemgetter(0)
    else:
        key = lambda item, key=kwargs['key']: key(item[0])

    while True:
        value, i, it = min(iterables.values(), key=key)
        yield value
        try:
            iterables[i][0] = next(it)
        except StopIteration:
            del iterables[i]
            if not iterables:
                raise

这篇关于使用 Python 对文本文件进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆