用BeautifulSoup包装标签的内容 [英] wrap the contents of a tag with BeautifulSoup
本文介绍了用BeautifulSoup包装标签的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想用BeautifulSoup包装标签的内容. 这个:
I'm tring to wrap the contents of a tag with BeautifulSoup. This:
<div class="footnotes">
<p>Footnote 1</p>
<p>Footnote 2</p>
</div>
应该变成这样:
<div class="footnotes">
<ol>
<p>Footnote 1</p>
<p>Footnote 2</p>
</ol>
</div>
所以我使用以下代码:
footnotes = soup.findAll("div", { "class" : "footnotes" })
footnotes_contents = ''
new_ol = soup.new_tag("ol")
for content in footnotes[0].children:
new_tag = soup.new_tag(content)
new_ol.append(new_tag)
footnotes[0].clear()
footnotes[0].append(new_ol)
print footnotes[0]
但是我得到以下信息:
<div class="footnotes"><ol><
></
><<p>Footnote 1</p>></<p>Footnote 1</p>><
></
><<p>Footnote 2</p>></<p>Footnote 2</p>><
></
></ol></div>
建议?
推荐答案
使用lxml:
import lxml.html as LH
import lxml.builder as builder
E = builder.E
doc = LH.parse('data')
footnote = doc.find('//div[@class="footnotes"]')
ol = E.ol()
for tag in footnote:
ol.append(tag)
footnote.append(ol)
print(LH.tostring(doc.getroot()))
打印
<html><body><div class="footnotes">
<ol><p>Footnote 1</p>
<p>Footnote 2</p>
</ol></div></body></html>
请注意,使用lxml
时,元素(标签)只能位于树中的一个位置(因为每个元素都只有一个父级),因此将tag
附加到ol
也会将其从footnote
中删除.因此,与BeautifulSoup不同,您不需要以相反的顺序遍历内容,也不需要使用insert(0,...)
.您只需按顺序追加即可.
Note that with lxml
, an Element (tag) can be in only one place in the tree (since every Element has only one parent), so appending tag
to ol
also removes it from footnote
. So unlike with BeautifulSoup, you do not need to iterate over the contents in reverse order, nor use insert(0,...)
. You just append in order.
使用BeautifulSoup:
Using BeautifulSoup:
import bs4 as bs
with open('data', 'r') as f:
soup = bs.BeautifulSoup(f)
footnote = soup.find("div", { "class" : "footnotes" })
new_ol = soup.new_tag("ol")
for content in reversed(footnote.contents):
new_ol.insert(0, content.extract())
footnote.append(new_ol)
print(soup)
打印
<html><body><div class="footnotes"><ol>
<p>Footnote 1</p>
<p>Footnote 2</p>
</ol></div></body></html>
这篇关于用BeautifulSoup包装标签的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文