用换行符将内容从一个文件追加到另一个文件 [英] append contents from one file to another with newline separation
问题描述
我想以一种与平台无关的方式复制Linux shell的 cat
功能,以便我可以采用以下方式合并两个文本文件并合并其内容:
I'm trying to, I think, replicate the cat
functionality of the Linux shell in a platform-agnostic way such that I can take two text files and merge their contents in the following manner:
文件_1包含:
42 bottles of beer on the wall
file_2包含:
Beer is clearly the answer
合并的文件应包含:
42 bottles of beer on the wall
Beer is clearly the answer
但是,我所读过的大多数技术最终都产生了:
Most of the techniques I've read about, however, end up producing:
42 bottles of beer on the wallBeer is clearly the answer
另一个问题是,我想使用的实际文件是非常大的文本文件(FASTA格式的蛋白质序列文件),以至于我认为大多数逐行读取的方法效率低下.因此,我一直试图找出使用 shutil
的解决方案,如下所示:
Another issue is that the actual files with which I'd like to work are incredibly large text files (FASTA formatted protein sequence files) such that I think most methods reading line-by-line are inefficient. Hence, I have been trying to figure out a solution using shutil
, as below:
def concatenate_fasta(file1, file2, newfile):
destination = open(newfile,'wb')
shutil.copyfileobj(open(file1,'rb'), destination)
destination.write('\n...\n')
shutil.copyfileobj(open(file2,'rb'), destination)
destination.close()
但是,这产生了与之前相同的问题,除了中间有"...".显然,换行符被忽略了,但是我对如何正确地管理它感到茫然.
However, this produces the same problem as earlier except with "..." in between. Clearly, the newlines are being ignored but I'm at a loss with how to properly manage it.
任何帮助将不胜感激.
我尝试了Martijn的建议,但是返回的 line_sep
值为 None
,当函数尝试将其写入输出文件时会抛出错误.我现在已经通过 os.linesep
方法使它开始工作了,该方法的最佳化如下:
I tried Martijn's suggestion, but the line_sep
value returned is None
, which throws an error when the function attempts to write that to the output file. I have gotten this working now via the os.linesep
method mentioned as less-optimal as follows:
with open(newfile,'wb') as destination:
with open(file_1,'rb') as source:
shutil.copyfileobj(source, destination)
destination.write(os.linesep*2)
with open(file_2,'rb') as source:
shutil.copyfileobj(source, destination)
destination.close()
这给了我所需的功能,但是对于为什么(看似更优雅的)解决方案失败了,我仍然有些茫然.
This gives me the functionality I need, but I'm still at a bit of a loss as to why the (seemingly more elegant) solution is failing.
推荐答案
您已以二进制模式打开文件,因此不会进行换行符转换.不同的平台使用不同的行尾,如果您使用的是Windows,则 还不够.
You have opened the files in binary mode, so no newline translation will take place. Different platforms use different line endings, and if you are on Windows \n
is not enough.
最简单的方法是编写 os.linesep
此处:
The simplest method would be to write os.linesep
here:
destination.write(os.linesep + '...' + os.linesep)
但是此可能违反了文件中使用的实际换行符约定.
but this could violate the actual newline convention used in the files.
更好的方法是在文本模式下打开文本文件,读取一行或两行,然后检查
The better approach would be to open the text files in text mode, read a line or two, then inspect the file.newlines
attribute to see what the convention is for that file:
def concatenate_fasta(file_1, file_2, newfile):
with open(file_1, 'r') as source:
next(source, None) # try and read a line
line_sep = source.newlines
if isinstance(line_sep, tuple):
# mixed newlines, lets just pick the first one
line_sep = line_sep[0]
with open(newfile,'wb') as destination
with open(file_1,'rb') as source:
shutil.copyfileobj(source, destination)
destination.write(line_sep + '...' + line_sep)
with open(file_2,'rb') as source:
shutil.copyfileobj(source, destination)
您可能还想测试 file_2
,如果使用的换行符与第一个文件不匹配,则可能引发异常.
You may want to test file_2
as well, perhaps raising an exception if the newline convention used doesn't match the first file.
这篇关于用换行符将内容从一个文件追加到另一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!