搜索和在HTML中使用BeautifulSoup替换 [英] Search and Replace in HTML with BeautifulSoup
问题描述
我想用BeautfulSoup搜索和替换< \\ A>
与< \\ A>< BR>
。我知道如何使用的urllib2
打开,然后分析提取所有< A>
标记。我想要做的是搜索和替换的结束标记加上休息结束标记。任何帮助,多AP preciated。
I want to use BeautfulSoup to search and replace <\a>
with <\a><br>
. I know how to open with urllib2
and then parse to extract all the <a>
tags. What I want to do is search and replace the closing tag with the closing tag plus the break. Any help, much appreciated.
修改
我会假设这将是类似于:
I would assume it would be something similar to:
soup.findAll('a').
在文档中,有一个:
find(text="ahh").replaceWith('Hooray')
所以我会假设这将是大致相同的:
So I would assume it would be along the lines of:
soup.findAll(tag = '</a>').replaceWith(tag = '</a><br>')
但是,这并不工作,蟒蛇帮助()并没有给太多
But that doesn't work and the python help() doesn't give much
推荐答案
这将插入一个&LT; BR&GT;
每结束后的标记 &LT; A&GT; ...&LT; / A&GT;
元素:
This will insert a <br>
tag after the end of each <a>...</a>
element:
from BeautifulSoup import BeautifulSoup, Tag
# ....
soup = BeautifulSoup(data)
for a in soup.findAll('a'):
a.parent.insert(a.parent.index(a)+1, Tag(soup, 'br'))
您不能使用 soup.findAll(标签='&LT; / A&GT;')
,因为BeautifulSoup不就完了标签单独操作 - 它们被认为是同一元素的一部分。
You can't use soup.findAll(tag = '</a>')
because BeautifulSoup doesn't operate on the end tags separately - they are considered part of the same element.
如果你想要把&LT; A&GT;
在&LT元素; P&GT;
元素为你问在评论中,你可以使用这样的:
If you wanted to put the <a>
elements inside a <p>
element as you ask in a comment, you can use this:
for a in soup.findAll('a'):
p = Tag(soup, 'p') #create a P element
a.replaceWith(p) #Put it where the A element is
p.insert(0, a) #put the A element inside the P (between <p> and </p>)
同样,你不创建&LT; P&GT;
和&LT; / P&GT;
分开,因为他们是同一事物的一部分。
Again, you don't create the <p>
and </p>
separately because they are part of the same thing.
这篇关于搜索和在HTML中使用BeautifulSoup替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!