BeautifulSoup无法获取动态内容 [英] BeautifulSoup not grabbing dynamic content
问题描述
我遇到的问题是,我想从此页面获取相关链接: http://support.apple.com/kb/TS1538
My issue I'm having is that I want to grab the related links from this page: http://support.apple.com/kb/TS1538
如果我在Chrome或Safari中检查元素,则可以看到<div id="outer_related_articles">
和列出的所有文章.如果我尝试使用BeautifulSoup进行抓取,它将抓取该页面以及除相关文章以外的所有内容.
If I Inspect Element in Chrome or Safari I can see the <div id="outer_related_articles">
and all the articles listed. If I attempt to grab it with BeautifulSoup it will grab the page and everything except the related articles.
这是我到目前为止所拥有的:
Here's what I have so far:
import urllib2
from bs4 import BeautifulSoup
url = "http://support.apple.com/kb/TS1538"
response = urllib2.urlopen(url)
soup = BeautifulSoup(response.read())
print soup
推荐答案
此部分使用Java脚本加载.禁用浏览器的Javascript以查看BeautifulSoup
如何看到"页面.
This section is loaded using Javascript. Disable your browser's Javascript to see how BeautifulSoup
"sees" the page.
从这里开始,您有两个选择:
From here you have two options:
- 使用无头浏览器,它将执行Javascript.请参阅以下问题:适用于Python的无头浏览器(需要Java脚本支持!)
- 尝试找出苹果网站如何加载内容并对其进行仿真-它可能会对某个地址进行AJAX调用.
After some digging it seems it does a request to this address (http://km.support.apple.com/kb/index?page=kmdata&requestid=2&query=iOS%3A%20Device%20not%20recognized%20in%20iTunes%20for%20Windows&locale=en_US&src=support_site.related_articles.TS1538&excludeids=TS1538&callback=KmLoader.receiveSuccess) and uses JSONP to load the results with KmLoader.receiveSuccess
being the name of the receiving function. Use Firebug of Chrome dev tools to inspect the page in more detail.
这篇关于BeautifulSoup无法获取动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!