添加父标签与美丽的汤 [英] Add parent tags with beautiful soup

查看:130
本文介绍了添加父标签与美丽的汤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有包含这些code片段各个部分的HTML多页:

I have many pages of HTML with various sections containing these code snippets:

<div class="footnote" id="footnote-1">
<h3>Reference:</h3>
<table cellpadding="0" cellspacing="0" class="floater" style="margin-bottom:0;" width="100%">
<tr>
<td valign="top" width="20px">
<a href="javascript:void(0);" onclick='javascript:toggleFootnote("footnote-1");' title="click to hide this reference">1.</a>
</td>
<td>
<p> blah </p>
</td>
</tr>
</table>
</div>

我可以成功地解析HTML和提取这些相关的标签

I can parse the HTML successfully and extract these relevant tags

tags = soup.find_all(attrs={"footnote"})

现在我需要添加这些这些新的父标签的code段云:

Now I need to add new parent tags about these such that the code snippet goes:

<div class="footnote-out"><CODE></div>

但我找不到添加父标签BS4这样,他们振奋的识别标签的方法。插入()/ insert_before识别的标记之后加入

But I can't find a way of adding parent tags in bs4 such that they brace the identified tags. insert()/insert_before add in after the identified tags.

我开始试图通过字符串manupulation:

I started by trying string manupulation:

for tags in soup.find_all(attrs={"footnote"}):
      tags = BeautifulSoup("""<div class="footnote-out">"""+str(tags)+("</div>"))

但我相信这不是最好的办法。

but I believe this isn't the best course.

感谢您的帮助。只是用BS / BS4开始,但似乎无法破解这个。

Thanks for any help. Just started using bs/bs4 but can't seem to crack this.

推荐答案

这个怎么样:

def wrap(to_wrap, wrap_in):
    contents = to_wrap.replace_with(wrap_in)
    wrap_in.append(contents)

简单的例子:

from bs4 import BeautifulSoup
soup = BeautifulSoup("<body><a>Some text</a></body>")
wrap(soup.a, soup.new_tag("b"))
print soup.body
# <body><b><a>Some text</a></b></body>

与文档示例:

for footnote in soup.find_all("div", "footnote"):
    new_tag = soup.new_tag("div")
    new_tag['class'] = 'footnote-out'
    wrap(footnote, new_tag)

这篇关于添加父标签与美丽的汤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆