在python中合并排序 [英] merge sort in python

查看:85
本文介绍了在python中合并排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我有一堆包含域的文件.我已经使用.sort(key = func_that_returns_tld)根据其TLD对每个文件进行了排序

basically I have a bunch of files containing domains. I've sorted each individual file based on its TLD using .sort(key=func_that_returns_tld)

现在,我已经完成了要合并所有文件并最终合并为一个大型排序文件的操作.我认为我需要这样的东西:

now that I've done that I want to merge all the files and end up wtih one massive sorted file. I assume I need something like this:

open all files
read one line from each file into a list
sort list with .sort(key=func_that_returns_tld)
output that list to file
loop by reading next line

我在考虑这个权利吗?任何有关如何做到这一点的建议将不胜感激.

am I thinking about this right? any advice on how to accomplish this would be appreciated.

推荐答案

如果文件不是很大,则只需将它们全部读入内存即可(如S. Lott所建议的).那绝对是最简单的.

If your files are not very large, then simply read them all into memory (as S. Lott suggests). That would definitely be simplest.

但是,您提到排序规则创建了一个大量"文件.如果太大而无法容纳内存,则可以使用 heapq.merge heapq.merge heapq.merge .设置起来可能会有些困难,但是它的优点是不需要将所有可迭代对象立即拉入内存.

However, you mention collation creates one "massive" file. If it's too massive to fit in memory, then perhaps use heapq.merge. It may be a little harder to set up, but it has the advantage of not requiring that all the iterables be pulled into memory at once.

import heapq
import contextlib

class Domain(object):
    def __init__(self,domain):
        self.domain=domain
    @property
    def tld(self):
        # Put your function for calculating TLD here
        return self.domain.split('.',1)[0]
    def __lt__(self,other):
        return self.tld<=other.tld
    def __str__(self):
        return self.domain

class DomFile(file):
    def next(self):
        return Domain(file.next(self).strip())

filenames=('data1.txt','data2.txt')
with contextlib.nested(*(DomFile(filename,'r') for filename in filenames)) as fhs:
    for elt in heapq.merge(*fhs):
        print(elt)

含data1.txt:

with data1.txt:

google.com
stackoverflow.com
yahoo.com

和data2.txt:

and data2.txt:

standards.freedesktop.org
www.imagemagick.org

产量:

google.com
stackoverflow.com
standards.freedesktop.org
www.imagemagick.org
yahoo.com

这篇关于在python中合并排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆