使用replaceWith问题上的Python与BeautifulSoup取代HTML标签 [英] Problem using replaceWith to replace HTML tags with BeautifulSoup on Python

查看:575
本文介绍了使用replaceWith问题上的Python与BeautifulSoup取代HTML标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用BeautifulSoup的Python和时遇到麻烦替换一些标签。我发现< D​​IV> 标记,并检查孩子。如果这些孩子没有孩子(有NODE_TYPE = 3的文本节点),我将它们复制是一个< P>

I am using BeautifulSoup in Python and am having trouble replacing some tags. I am finding <div> tags and checking for children. If those children do not have children (are a text node of NODE_TYPE = 3), I am copying them to be a <p>.

from BeautifulSoup import Tag, BeautifulSoup

class bar:

 self.soup = BeautifulSoup(self.input)
 foo()
 def foo(self):    
  elements = soup.findAll(True)

  for node in elements:

    # ....other stuff here if not <div> tags.

    if node.name.lower() == "div":
      if not node.find('a'):
        newTag = Tag(self.soup, "p")
        newTag.setString(node.text)
        node.replaceWith(newTag)
        nodesToScore.append(newTag)
      else:
        for n in node.findAll(True):
          if n.getString():  # False if has children
            newTag = Tag(self.soup, "p")
            newTag.setString(n.text)
            n.replaceWith(newTag)

我得到一个AttributeError:

I'm getting an AttributeError:

  File "file.py", line 125, in function
    node.replaceWith(newTag)
  File "BeautifulSoup.py", line 131, in replaceWith
    myIndex = self.parent.index(self)
AttributeError: 'NoneType' object has no attribute 'index'

我做同样的替换上节点上涨在for循环,它工作正常。我假设它是有因为通过节点为n的附加迭代的问题。

I do the same replacing on node higher up in the for loop and it works correctly. I'm assuming it's having problems because of the additional iterating through node as n.

我在做什么错了还是什么会是一个更好的方式来做到这一点?谢谢!
PS。我使用Python 2.5谷歌的AppEngine和BeautifulSoup 3.0.8.1

What am I doing wrong or what would be a better way to do this? Thanks! PS. I'm using Python 2.5 for Google Appengine and BeautifulSoup 3.0.8.1

推荐答案

错误说:

    myIndex = self.parent.index(self)
AttributeError: 'NoneType' object has no attribute 'index'

这code发生在BeautifulSoup.py的131线。
它说, self.parent 为无。

This code occurs on line 131 of BeautifulSoup.py. It says that self.parent is None.

看着周围code则显示,应该等于节点在code,因为节点是调用它的 replaceWith 办法(注:该错误消息说 node.replaceWith ,而code你贴节目 n.replaceWith 。在code您发布不对应错误消息/追踪。)因此很明显, node.parent 为无。

Looking at the surrounding code shows that self should equal node in your code, since node is calling its replaceWith method.(Note: The error message says node.replaceWith, but the code you posted shows n.replaceWith. The code you posted does not correspond to the error message/traceback.) So apparently node.parent is None.

您可以通过将可能避免错误

You could probably avoid the error by placing

if node.parent is not None:

在之前的code某一点 node.replaceWith 被调用。

编辑:我建议你使用打印语句来调查其中你当 node.parent 是HTML无(即当错误发生)。也许使用打印node.contents 打印节点。previous.contents 打印node.next.contents 来看看你在哪里。一旦你看到的HTML可能变得很明显你是什么样的病理情况,这是造成 node.parent

I suggest you use print statements to investigate where in the HTML you are when node.parent is None (i.e. where the error is occurring). Maybe use print node.contents or print node.previous.contents or print node.next.contents to see where you are. Once you see the HTML it might become obvious what pathological situation you are in which is causing node.parent to be None.

这篇关于使用replaceWith问题上的Python与BeautifulSoup取代HTML标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆