用换行符将内容从一个文件追加到另一个文件 [英] append contents from one file to another with newline separation

查看:74
本文介绍了用换行符将内容从一个文件追加到另一个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以一种与平台无关的方式复制Linux shell的 cat 功能,以便我可以采用以下方式合并两个文本文件并合并其内容:

I'm trying to, I think, replicate the cat functionality of the Linux shell in a platform-agnostic way such that I can take two text files and merge their contents in the following manner:

文件_1包含:

42 bottles of beer on the wall

file_2包含:

Beer is clearly the answer

合并的文件应包含:

42 bottles of beer on the wall  
Beer is clearly the answer

但是,我所读过的大多数技术最终都产生了:

Most of the techniques I've read about, however, end up producing:

42 bottles of beer on the wallBeer is clearly the answer

另一个问题是,我想使用的实际文件是非常大的文本文件(FASTA格式的蛋白质序列文件),以至于我认为大多数逐行读取的方法效率低下.因此,我一直试图找出使用 shutil 的解决方案,如下所示:

Another issue is that the actual files with which I'd like to work are incredibly large text files (FASTA formatted protein sequence files) such that I think most methods reading line-by-line are inefficient. Hence, I have been trying to figure out a solution using shutil, as below:

def concatenate_fasta(file1, file2, newfile):
    destination = open(newfile,'wb')
    shutil.copyfileobj(open(file1,'rb'), destination)
    destination.write('\n...\n')
    shutil.copyfileobj(open(file2,'rb'), destination)
    destination.close()

但是,这产生了与之前相同的问题,除了中间有"...".显然,换行符被忽略了,但是我对如何正确地管理它感到茫然.

However, this produces the same problem as earlier except with "..." in between. Clearly, the newlines are being ignored but I'm at a loss with how to properly manage it.

任何帮助将不胜感激.

我尝试了Martijn的建议,但是返回的 line_sep 值为 None ,当函数尝试将其写入输出文件时会抛出错误.我现在已经通过 os.linesep 方法使它开始工作了,该方法的最佳化如下:

I tried Martijn's suggestion, but the line_sep value returned is None, which throws an error when the function attempts to write that to the output file. I have gotten this working now via the os.linesep method mentioned as less-optimal as follows:

with open(newfile,'wb') as destination:
    with open(file_1,'rb') as source:
        shutil.copyfileobj(source, destination)
    destination.write(os.linesep*2)
    with open(file_2,'rb') as source:
        shutil.copyfileobj(source, destination)
    destination.close()

这给了我所需的功能,但是对于为什么(看似更优雅的)解决方案失败了,我仍然有些茫然.

This gives me the functionality I need, but I'm still at a bit of a loss as to why the (seemingly more elegant) solution is failing.

推荐答案

您已以二进制模式打开文件,因此不会进行换行符转换.不同的平台使用不同的行尾,如果您使用的是Windows,则 还不够.

You have opened the files in binary mode, so no newline translation will take place. Different platforms use different line endings, and if you are on Windows \n is not enough.

最简单的方法是编写 os.linesep 此处:

The simplest method would be to write os.linesep here:

destination.write(os.linesep + '...' + os.linesep)

但是此可能违反了文件中使用的实际换行符约定.

but this could violate the actual newline convention used in the files.

更好的方法是在文本模式下打开文本文件,读取一行或两行,然后检查

The better approach would be to open the text files in text mode, read a line or two, then inspect the file.newlines attribute to see what the convention is for that file:

def concatenate_fasta(file_1, file_2, newfile):
    with open(file_1, 'r') as source:
        next(source, None)  # try and read a line
        line_sep = source.newlines
        if isinstance(line_sep, tuple):
            # mixed newlines, lets just pick the first one
            line_sep = line_sep[0]

    with open(newfile,'wb') as destination
        with open(file_1,'rb') as source:
            shutil.copyfileobj(source, destination)
        destination.write(line_sep + '...' + line_sep)

        with open(file_2,'rb') as source:
            shutil.copyfileobj(source, destination)

您可能还想测试 file_2 ,如果使用的换行符与第一个文件不匹配,则可能引发异常.

You may want to test file_2 as well, perhaps raising an exception if the newline convention used doesn't match the first file.

这篇关于用换行符将内容从一个文件追加到另一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆