BeautifulSoup没有工作,得到NoneType错误 [英] BeautifulSoup not working, getting NoneType error

查看:961
本文介绍了BeautifulSoup没有工作,得到NoneType错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用下面的code(来自<拍摄一href=\"http://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautifulsoup#1080472\">retrieve使用Python和BeautifulSoup )从网页上的链接:

 进口的httplib2
从BeautifulSoup进口BeautifulSoup,SoupStrainerHTTP =的httplib2.Http()
状态,响应= http.request('http://www.nytimes.com')在BeautifulSoup链路(响应,parseOnlyThese = SoupStrainer('一个')):
    如果link.has_attr('href属性):
        打印链接['href属性]

不过,我不明白为什么我收到以下错误信息:

 回溯(最后最近一次调用):
  文件C:\\用户\\ EANUAMA \\工作区\\ PatternExtractor的\\ src \\来源$ C ​​$ cExtractor.py,13号线,上述&lt;&模块GT;
    如果link.has_attr('href属性):
类型错误:'NoneType'对象不是可调用

BeautifulSoup 3.2.0
Python 2.7版

编辑:

我试过了类似的问题提供解决方案(<一个href=\"http://stackoverflow.com/questions/19424009/type-error-if-link-has-attrhref-typeerror-nonetype-object-is-not-callabl\">Type如果错误link.has_attr('href属性):类型错误:'NoneType'对象不是可调用),但它给我以下错误:

 回溯(最后最近一次调用):
  文件C:\\用户\\ EANUAMA \\工作区\\ PatternExtractor的\\ src \\来源$ C ​​$ cExtractor.py,12号线,上述&lt;&模块GT;
    在BeautifulSoup(响应).find_all('A'中,href = TRUE)链接:
类型错误:'NoneType'对象不是可调用


解决方案

首先:


  

从BeautifulSoup进口BeautifulSoup,SoupStrainer


您正在使用 BeautifulSoup 第3版其中的不再维护的。切换到 BeautifulSoup 4版。通过安装:

  PIP安装beautifulsoup4

和改变你的进口:

 从BS4进口BeautifulSoup


还有:


  

回溯(最近通话最后一个):
    文件C:\\用户\\ EANUAMA \\工作区\\ PatternExtractor的\\ src \\来源$ C ​​$ cExtractor.py,13号线,在
      如果link.has_attr('href属性):
  类型错误:'NoneType'对象不是可调用


下面链接标签实例,它不具有 has_​​attr 方法。这意味着,想起什么点符号的意思 BeautifulSoup ,它会尝试搜索元素 has_​​attr 链接元素,这导致到没有找到。换句话说, link.has_attr ,显然无(HREF')成果转化为一个错误。

相反,这样做:

 汤= BeautifulSoup(响应,parse_only = SoupStrainer('A'中,href = TRUE))
在soup.find_all链路(一,HREF =真):
    打印(链接['的href'])


仅供参考,这里是一个完整的工作code,我用来调试您的问题(使用要求

 进口要求
从BS4进口BeautifulSoup,SoupStrainer
响应= requests.get('http://www.nytimes.com').content
在BeautifulSoup链路(响应,parseOnlyThese = SoupStrainer('一个',HREF =真))find_all(一,HREF =真):
    打印(链接['的href'])

I am using the following code (Taken from retrieve links from web page using python and BeautifulSoup):

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_attr('href'):
        print link['href']

However, I don't understand why I am getting the following error message:

Traceback (most recent call last):
  File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module>
    if link.has_attr('href'):
TypeError: 'NoneType' object is not callable

BeautifulSoup 3.2.0 Python 2.7

EDIT:

I tried the solution available for the similar question(Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable), but it is giving me following error:

Traceback (most recent call last):
  File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module>
    for link in BeautifulSoup(response).find_all('a', href=True):
TypeError: 'NoneType' object is not callable

解决方案

First of all:

from BeautifulSoup import BeautifulSoup, SoupStrainer

You are using BeautifulSoup version 3 which is no longer maintained. Switch to BeautifulSoup version 4. Install it via:

pip install beautifulsoup4

and change your import to:

from bs4 import BeautifulSoup


Also:

Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in if link.has_attr('href'): TypeError: 'NoneType' object is not callable

Here link is a Tag instance which does not have an has_attr method. This means that, remembering what a dot notation means in BeautifulSoup, it would try to search for element has_attr inside the link element which results into nothing found. In other words, link.has_attr is None and obviously None('href') results into an error.

Instead, do:

soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True))
for link in soup.find_all("a", href=True):
    print(link['href'])


FYI, here is a complete working code that I used to debug your problem (using requests):

import requests
from bs4 import BeautifulSoup, SoupStrainer


response = requests.get('http://www.nytimes.com').content
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True):
    print(link['href'])

这篇关于BeautifulSoup没有工作,得到NoneType错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆