Python:使用正则表达式在某些网站标签之间查找句子 [英] Python: Find a Sentence between some website-tags using regex
本文介绍了Python:使用正则表达式在某些网站标签之间查找句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在 ...class="question-hyperlink">
标签之间找到一个句子.使用此代码:
I want to find a sentence between the ...class="question-hyperlink">
tags.
With this code:
import urllib2
import re
response = urllib2.urlopen('https://stackoverflow.com/questions/tagged/python')
html = response.read(20000)
a = re.search('question-hyperlink', html)
print html[a.end()+3:a.end()+100]
我明白了:
DF5 for Python: high level vs low level interfaces. h5py</a></h3> <div class="excerpt">
如何在下一个 <
处停止?我如何找到下一个句子?我想用正则表达式来做.
How can I stop at the next <
?
And how do I find the next sentence?
I want to do it with regex.
编辑致反对者:我想像他那样做:正则表达式匹配除 XHTML 自包含标签之外的开放标签
EDIT To the downvoters: I want to do it like he does: RegEx match open tags except XHTML self-contained tags
推荐答案
如果你必须用正则表达式来做,试试这样的:
If you must do it with regular expressions, try something like this:
a = re.finditer('<a.+?question-hyperlink">(.+?)</a>', html)
for m in a:
print m.group(1)
仅供参考,此代码执行相同的操作,但方式更加健壮:
Just for the reference, this code does the same, but in a far more robust way:
doc = BeautifulSoup(html)
for a in doc.findAll('a', 'question-hyperlink'):
print a.text
这篇关于Python:使用正则表达式在某些网站标签之间查找句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文