BeautifulSoup没有工作,得到NoneType错误 [英] BeautifulSoup not working, getting NoneType error
问题描述
我用下面的code(来自<拍摄一href=\"http://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautifulsoup#1080472\">retrieve使用Python和BeautifulSoup )从网页上的链接:
进口的httplib2
从BeautifulSoup进口BeautifulSoup,SoupStrainerHTTP =的httplib2.Http()
状态,响应= http.request('http://www.nytimes.com')在BeautifulSoup链路(响应,parseOnlyThese = SoupStrainer('一个')):
如果link.has_attr('href属性):
打印链接['href属性]
不过,我不明白为什么我收到以下错误信息:
回溯(最后最近一次调用):
文件C:\\用户\\ EANUAMA \\工作区\\ PatternExtractor的\\ src \\来源$ C $ cExtractor.py,13号线,上述&lt;&模块GT;
如果link.has_attr('href属性):
类型错误:'NoneType'对象不是可调用
BeautifulSoup 3.2.0
Python 2.7版
编辑:
我试过了类似的问题提供解决方案(<一个href=\"http://stackoverflow.com/questions/19424009/type-error-if-link-has-attrhref-typeerror-nonetype-object-is-not-callabl\">Type如果错误link.has_attr('href属性):类型错误:'NoneType'对象不是可调用),但它给我以下错误:
回溯(最后最近一次调用):
文件C:\\用户\\ EANUAMA \\工作区\\ PatternExtractor的\\ src \\来源$ C $ cExtractor.py,12号线,上述&lt;&模块GT;
在BeautifulSoup(响应).find_all('A'中,href = TRUE)链接:
类型错误:'NoneType'对象不是可调用
首先:
从BeautifulSoup进口BeautifulSoup,SoupStrainer
块引用>您正在使用
BeautifulSoup
第3版其中的不再维护的。切换到BeautifulSoup
4版一>。通过安装:PIP安装beautifulsoup4
和改变你的进口:
从BS4进口BeautifulSoup
还有:
回溯(最近通话最后一个):
文件C:\\用户\\ EANUAMA \\工作区\\ PatternExtractor的\\ src \\来源$ C $ cExtractor.py,13号线,在
如果link.has_attr('href属性):
类型错误:'NoneType'对象不是可调用
块引用>下面
链接
是标签
实例,它不具有has_attr
方法。这意味着,想起什么点符号的意思BeautifulSoup
,它会尝试搜索元素has_attr
的链接内code>元素,这导致到没有找到。换句话说,
link.has_attr
是无
,显然无(HREF')
成果转化为一个错误。相反,这样做:
汤= BeautifulSoup(响应,parse_only = SoupStrainer('A'中,href = TRUE))
在soup.find_all链路(一,HREF =真):
打印(链接['的href'])仅供参考,这里是一个完整的工作code,我用来调试您的问题(使用
要求
)进口要求
从BS4进口BeautifulSoup,SoupStrainer
响应= requests.get('http://www.nytimes.com').content
在BeautifulSoup链路(响应,parseOnlyThese = SoupStrainer('一个',HREF =真))find_all(一,HREF =真):
打印(链接['的href'])I am using the following code (Taken from retrieve links from web page using python and BeautifulSoup):
import httplib2 from BeautifulSoup import BeautifulSoup, SoupStrainer http = httplib2.Http() status, response = http.request('http://www.nytimes.com') for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')): if link.has_attr('href'): print link['href']
However, I don't understand why I am getting the following error message:
Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module> if link.has_attr('href'): TypeError: 'NoneType' object is not callable
BeautifulSoup 3.2.0 Python 2.7
EDIT:
I tried the solution available for the similar question(Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable), but it is giving me following error:
Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module> for link in BeautifulSoup(response).find_all('a', href=True): TypeError: 'NoneType' object is not callable
解决方案First of all:
from BeautifulSoup import BeautifulSoup, SoupStrainer
You are using
BeautifulSoup
version 3 which is no longer maintained. Switch toBeautifulSoup
version 4. Install it via:pip install beautifulsoup4
and change your import to:
from bs4 import BeautifulSoup
Also:
Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in if link.has_attr('href'): TypeError: 'NoneType' object is not callable
Here
link
is aTag
instance which does not have anhas_attr
method. This means that, remembering what a dot notation means inBeautifulSoup
, it would try to search for elementhas_attr
inside thelink
element which results into nothing found. In other words,link.has_attr
isNone
and obviouslyNone('href')
results into an error.Instead, do:
soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True)) for link in soup.find_all("a", href=True): print(link['href'])
FYI, here is a complete working code that I used to debug your problem (using
requests
):import requests from bs4 import BeautifulSoup, SoupStrainer response = requests.get('http://www.nytimes.com').content for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True): print(link['href'])
这篇关于BeautifulSoup没有工作,得到NoneType错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!