使用replaceWith问题上的Python与BeautifulSoup取代HTML标签 [英] Problem using replaceWith to replace HTML tags with BeautifulSoup on Python
问题描述
我使用BeautifulSoup的Python和时遇到麻烦替换一些标签。我发现< DIV>
标记,并检查孩子。如果这些孩子没有孩子(有NODE_TYPE = 3的文本节点),我将它们复制是一个< P>
I am using BeautifulSoup in Python and am having trouble replacing some tags. I am finding <div>
tags and checking for children. If those children do not have children (are a text node of NODE_TYPE = 3), I am copying them to be a <p>
.
from BeautifulSoup import Tag, BeautifulSoup
class bar:
self.soup = BeautifulSoup(self.input)
foo()
def foo(self):
elements = soup.findAll(True)
for node in elements:
# ....other stuff here if not <div> tags.
if node.name.lower() == "div":
if not node.find('a'):
newTag = Tag(self.soup, "p")
newTag.setString(node.text)
node.replaceWith(newTag)
nodesToScore.append(newTag)
else:
for n in node.findAll(True):
if n.getString(): # False if has children
newTag = Tag(self.soup, "p")
newTag.setString(n.text)
n.replaceWith(newTag)
我得到一个AttributeError:
I'm getting an AttributeError:
File "file.py", line 125, in function
node.replaceWith(newTag)
File "BeautifulSoup.py", line 131, in replaceWith
myIndex = self.parent.index(self)
AttributeError: 'NoneType' object has no attribute 'index'
我做同样的替换上节点
上涨在for循环,它工作正常。我假设它是有因为通过节点为n的附加迭代的问题。
I do the same replacing on node
higher up in the for loop and it works correctly. I'm assuming it's having problems because of the additional iterating through node as n.
我在做什么错了还是什么会是一个更好的方式来做到这一点?谢谢!
PS。我使用Python 2.5谷歌的AppEngine和BeautifulSoup 3.0.8.1
What am I doing wrong or what would be a better way to do this? Thanks! PS. I'm using Python 2.5 for Google Appengine and BeautifulSoup 3.0.8.1
推荐答案
错误说:
myIndex = self.parent.index(self)
AttributeError: 'NoneType' object has no attribute 'index'
这code发生在BeautifulSoup.py的131线。
它说, self.parent
为无。
This code occurs on line 131 of BeautifulSoup.py.
It says that self.parent
is None.
看着周围code则显示,自
应该等于节点
在code,因为节点
是调用它的 replaceWith
办法(注:该错误消息说 node.replaceWith
,而code你贴节目 n.replaceWith
。在code您发布不对应错误消息/追踪。)因此很明显, node.parent
为无。
Looking at the surrounding code shows that self
should equal node
in your code, since node
is calling its replaceWith
method.(Note: The error message says node.replaceWith
, but the code you posted shows n.replaceWith
. The code you posted does not correspond to the error message/traceback.) So apparently node.parent
is None.
您可以通过将可能避免错误
You could probably avoid the error by placing
if node.parent is not None:
在之前的code某一点 node.replaceWith
被调用。
编辑:我建议你使用打印
语句来调查其中你当 node.parent
是HTML无(即当错误发生)。也许使用打印node.contents
或打印节点。previous.contents
或打印node.next.contents
来看看你在哪里。一旦你看到的HTML可能变得很明显你是什么样的病理情况,这是造成 node.parent
是无
。
I suggest you use print
statements to investigate where in the HTML you are when node.parent
is None (i.e. where the error is occurring). Maybe use print node.contents
or print node.previous.contents
or print node.next.contents
to see where you are. Once you see the HTML it might become obvious what pathological situation you are in which is causing node.parent
to be None
.
这篇关于使用replaceWith问题上的Python与BeautifulSoup取代HTML标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!