在BeautifulSoup中使用换行符提取文本 [英] Extract text with line break in BeautifulSoup
本文介绍了在BeautifulSoup中使用换行符提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想用BeautifulSoup提取带有换行符的文本以及"br"标记.
I'd like to extract text with line break along with "br" tag with BeautifulSoup.
html = "<td class="s4 softmerge" dir="ltr"><div class="softmerge-inner" style="width: 5524px; left: -1px;">But when he saw many of the Pharisees and Sadducees come to his baptism, he said unto them, <br/>O generation of vipers, who hath warned you to flee from the wrath to come?<br/>Bring forth therefore fruits meet for repentance:<br/>And think not to say within yourselves, We have Abraham to our father: for I say unto you, that God is able of these stones to raise up children unto Abraham.<br/>And now also the axe is laid unto the root of the trees: therefore every tree which bringeth not forth good fruit is hewn down, and cast into the fire.<br/>I indeed baptize you with water unto repentance. but he that cometh after me is mightier than I, whose shoes I am not worthy to bear: he shall baptize you with the Holy Ghost, and with fire:<br/>Whose fan is in his hand, and he will throughly purge his floor, and gather his wheat into the garner; but he will burn up the chaff with unquenchable fire.</div></td>"
我想在字符串中得到这样的结果;
I want to get result like this in string;
But when he saw many of the Pharisees and Sadducees come to his baptism, he said unto them,
O generation of vipers, who hath warned you to flee from the wrath to come?
Bring forth therefore fruits meet for repentance:
And think not to say within yourselves, We have Abraham to our father: for I say unto you, that God is able of these stones to raise up children unto Abraham.
And now also the axe is laid unto the root of the trees: therefore every tree which bringeth not forth good fruit is hewn down, and cast into the fire.
I indeed baptize you with water unto repentance. but he that cometh after me is mightier than I, whose shoes I am not worthy to bear: he shall baptize you with the Holy Ghost, and with fire:
Whose fan is in his hand, and he will throughly purge his floor, and gather his wheat into the garner; but he will burn up the chaff with unquenchable fire.
如何编码才能获得此结果?
How can I code to get this result?
推荐答案
有两种获取结果的方法
- 匹配标记中的每个字符串,
- 查看它是否属于
NavigableString
代码
soup = BeautifulSoup(html,"lxml")
for ele in soup.find("div",class_="softmerge-inner"):
if isinstance(ele,NavigableString):
print(ele)
print()
result = [ele[1] for ele in re.findall(r"""(<div.*?>|<br.>)(.*?)(?=<\w{1,4}/>|</\w{1,4}>)""",html)]
for e in result:
print(e)
这篇关于在BeautifulSoup中使用换行符提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文