搜索和在HTML中使用BeautifulSoup替换 [英] Search and Replace in HTML with BeautifulSoup

查看:1586
本文介绍了搜索和在HTML中使用BeautifulSoup替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用BeautfulSoup搜索和替换< \\ A> < \\ A>< BR> 。我知道如何使用的urllib2 打开,然后分析提取所有< A> 标记。我想要做的是搜索和替换的结束标记加上休息结束标记。任何帮助,多AP preciated。

I want to use BeautfulSoup to search and replace <\a> with <\a><br>. I know how to open with urllib2 and then parse to extract all the <a> tags. What I want to do is search and replace the closing tag with the closing tag plus the break. Any help, much appreciated.

修改

我会假设这将是类似于:

I would assume it would be something similar to:

soup.findAll('a').

在文档中,有一个:

find(text="ahh").replaceWith('Hooray')

所以我会假设这将是大致相同的:

So I would assume it would be along the lines of:

soup.findAll(tag = '</a>').replaceWith(tag = '</a><br>')

但是,这并不工作,蟒蛇帮助()并没有给太多

But that doesn't work and the python help() doesn't give much

推荐答案

这将插入一个&LT; BR&GT; 每结束后的标记 &LT; A&GT; ...&LT; / A&GT; 元素:

This will insert a <br> tag after the end of each <a>...</a> element:

from BeautifulSoup import BeautifulSoup, Tag

# ....

soup = BeautifulSoup(data)
for a in soup.findAll('a'):
    a.parent.insert(a.parent.index(a)+1, Tag(soup, 'br'))

您不能使用 soup.findAll(标签='&LT; / A&GT;'),因为BeautifulSoup不就完了标签单独操作 - 它们被认为是同一元素的一部分。

You can't use soup.findAll(tag = '</a>') because BeautifulSoup doesn't operate on the end tags separately - they are considered part of the same element.

如果你想要把&LT; A&GT; &LT元素; P&GT; 元素为你问在评论中,你可以使用这样的:

If you wanted to put the <a> elements inside a <p> element as you ask in a comment, you can use this:

for a in soup.findAll('a'):
    p = Tag(soup, 'p') #create a P element
    a.replaceWith(p)   #Put it where the A element is
    p.insert(0, a)     #put the A element inside the P (between <p> and </p>)

同样,你不创建&LT; P&GT; &LT; / P&GT; 分开,因为他们是同一事物的一部分。

Again, you don't create the <p> and </p> separately because they are part of the same thing.

这篇关于搜索和在HTML中使用BeautifulSoup替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆