使用特殊的分隔线将文本文件拆分为部分-python [英] Splitting textfile into section with special delimiter line - python
本文介绍了使用特殊的分隔线将文本文件拆分为部分-python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这样的输入文件:
This is a text block start
This is the end
And this is another
with more than one line
and another line.
所需的任务是按由特殊行分隔的部分读取文件,在这种情况下,该行为空行,例如[出]:
The desired task is to read the files by section delimited by some special line, in this case it's an empty line, e.g. [out]:
[['This is a text block start', 'This is the end'],
['And this is another','with more than one line', 'and another line.']]
我一直在通过这种方式获得所需的输出:
I have been getting the desired output by doing so:
def per_section(it):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line.strip('\n'):
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)
但是如果特殊行是以#
开头的行,例如:
But if the special line is a line that starts with #
e.g.:
# Some comments, maybe the title of the following section
This is a text block start
This is the end
# Some other comments and also the title
And this is another
with more than one line
and another line.
我必须这样做:
def per_section(it):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line[0] != "#":
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)
如果我允许per_section()
具有分隔符参数,则可以尝试以下操作:
If i were to allow the per_section()
to have a delimiter parameter, I could try this:
def per_section(it, delimiter== '\n'):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line.strip('\n') and delimiter == '\n':
section.append(line)
elif delimiter= '\#' and line[0] != "#":
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)
但是有没有办法使我不对所有可能的分隔符进行硬编码?
推荐答案
传递谓词如何?
def per_section(it, is_delimiter=lambda x: x.isspace()):
ret = []
for line in it:
if is_delimiter(line):
if ret:
yield ret # OR ''.join(ret)
ret = []
else:
ret.append(line.rstrip()) # OR ret.append(line)
if ret:
yield ret
用法:
with open('/path/to/file.txt') as f:
sections = list(per_section(f)) # default delimiter
with open('/path/to/file.txt.txt') as f:
sections = list(per_section(f, lambda line: line.startswith('#'))) # comment
这篇关于使用特殊的分隔线将文本文件拆分为部分-python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文