如何使用python库re.sub去除文件的开头? [英] how to strip the beginning of a file with python library re.sub?
本文介绍了如何使用python库re.sub去除文件的开头?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我很高兴问我的第一个 python 问题!!!我想去掉下面示例文件的开头(文章第一次出现之前的部分).为此,我使用 re.sub 库.
I'm happy to ask my first python question !!! I would like to strip the beginning (the part before the first occurrence of the article) of the sample file below. To do this I use re.sub library.
下面是我的文件sample.txt:
below this is my file sample.txt:
fdasfdadfa
adfadfasdf
afdafdsfas
adfadfadf
adfadsf
afdaf
article: name of the first article
aaaaaaa
aaaaaaa
aaaaaaa
article: name of the first article
bbbbbbb
bbbbbbb
bbbbbbb
article: name of the first article
ccccccc
ccccccc
ccccccc
还有我解析这个文件的 Python 代码:
And my Python code to parse this file:
for line in open('sample.txt'):
test = test + line
result = re.sub(r'.*article:', 'article', test, 1, flags=re.S)
print result
遗憾的是,此代码仅显示最后一篇文章.代码输出:
Sadly this code only displays the last article. The output of the code:
article: name of the first article
ccccccc
ccccccc
ccccccc
有人知道如何只去掉文件的开头并显示3篇文章吗?
Does someone know how to strip only the beginning of the file and display the 3 articles?
推荐答案
您可以使用 itertools.dropwhile
得到这个效果
You can use itertools.dropwhile
to get this effect
from itertools import dropwhile
with open('filename.txt') as f:
articles = ''.join(dropwhile(lambda line: not line.startswith('article'), f))
print(articles)
印刷品
article: name of the first article
aaaaaaa
aaaaaaa
aaaaaaa
article: name of the first article
bbbbbbb
bbbbbbb
bbbbbbb
article: name of the first article
ccccccc
ccccccc
ccccccc
这篇关于如何使用python库re.sub去除文件的开头?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文