Beautifulsoup无法使用ATTRS数据提取=类 [英] Beautifulsoup unable to extract data using attrs=class
问题描述
我是一个研究项目中提取数据,我已经成功地使用的findAll('格',ATTRS = {'类':'someClassName'})在许多网站
但是这个特定的网站,
<一个href=\"http://www.amazon.com/s/ref=sr_pg_1?rh=n:172282,k%3adigital%20camera&keywords=digital%20camera&ie=UTF8&qid=1343600585\"相对=nofollow>网站链接
当我用ATTRS选项不返回任何值。但是,当我不使用ATTRS选项我得到整个HTML DOM。
下面是一个简单的code,我开始测试一下:
汤= BS(的urlopen(URL))
在soup.findAll格('DIV',ATTRS = {'类':'数据'}):
打印格
我的code做工精细,用要求
进口要求
从BeautifulSoup进口BeautifulSoup作为BS
#grab HTML
R = requests.get(r'http://www.amazon.com/s/ref=sr_pg_1?rh=n:172282,k%3adigital%20camera&keywords=digital%20camera&ie=UTF8&qid=1343600585')
HTML = r.text
#parse的HTML
汤= BS(HTML)结果= soup.findAll('格',ATTRS = {'类':'数据'})打印结果
I am extracting data for a research project and I have sucessfully used findAll('div', attrs={'class':'someClassName'})
in many websites but this particular website,
doesn't return any values when I used attrs option. But when I don't use the attrs option I get entire html dom.
Here is the simple code that I started with to test it out:
soup = bs(urlopen(url))
for div in soup.findAll('div', attrs={'class':'data'}):
print div
My code is working fine, with requests
import requests
from BeautifulSoup import BeautifulSoup as bs
#grab HTML
r = requests.get(r'http://www.amazon.com/s/ref=sr_pg_1?rh=n:172282,k%3adigital%20camera&keywords=digital%20camera&ie=UTF8&qid=1343600585')
html = r.text
#parse the HTML
soup = bs(html)
results= soup.findAll('div', attrs={'class': 'data'})
print results
这篇关于Beautifulsoup无法使用ATTRS数据提取=类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!