Beautifulsoup无法使用ATTRS数据提取=类 [英] Beautifulsoup unable to extract data using attrs=class

查看:893
本文介绍了Beautifulsoup无法使用ATTRS数据提取=类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个研究项目中提取数据,我已经成功地使用的findAll('格',ATTRS = {'类':'someClassName'})在许多网站但是这个特定的网站,

<一个href=\"http://www.amazon.com/s/ref=sr_pg_1?rh=n:172282,k%3adigital%20camera&keywords=digital%20camera&ie=UTF8&qid=1343600585\"相对=nofollow>网站链接

当我用ATTRS选项不返回任何值。但是,当我不使用ATTRS选项我得到整个HTML DOM。

下面是一个简单的code,我开始测试一下:

 汤= BS(的urlopen(URL))
在soup.findAll格('DIV',ATTRS = {'类':'数据'}):
    打印格


解决方案

我的code做工精细,用要求

 进口要求
从BeautifulSoup进口BeautifulSoup作为BS
#grab HTML
R = requests.get(r'http://www.amazon.com/s/ref=sr_pg_1?rh=n:172282,k%3adigital%20camera&keywords=digital%20camera&ie=UTF8&qid=1343600585')
HTML = r.text
#parse的HTML
汤= BS(HTML)结果= soup.findAll('格',ATTRS = {'类':'数据'})打印结果

I am extracting data for a research project and I have sucessfully used findAll('div', attrs={'class':'someClassName'}) in many websites but this particular website,

WebSite Link

doesn't return any values when I used attrs option. But when I don't use the attrs option I get entire html dom.

Here is the simple code that I started with to test it out:

soup = bs(urlopen(url))
for div in soup.findAll('div', attrs={'class':'data'}):
    print div

解决方案

My code is working fine, with requests

import requests
from BeautifulSoup import BeautifulSoup as bs
#grab HTML
r = requests.get(r'http://www.amazon.com/s/ref=sr_pg_1?rh=n:172282,k%3adigital%20camera&keywords=digital%20camera&ie=UTF8&qid=1343600585')
html = r.text
#parse the HTML
soup = bs(html)

results= soup.findAll('div', attrs={'class': 'data'})

print results

这篇关于Beautifulsoup无法使用ATTRS数据提取=类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆