BeautifulSoup无法获取动态内容 [英] BeautifulSoup not grabbing dynamic content

查看:93
本文介绍了BeautifulSoup无法获取动态内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到的问题是,我想从此页面获取相关链接: http://support.apple.com/kb/TS1538

My issue I'm having is that I want to grab the related links from this page: http://support.apple.com/kb/TS1538

如果我在Chrome或Safari中检查元素,则可以看到<div id="outer_related_articles">和列出的所有文章.如果我尝试使用BeautifulSoup进行抓取,它将抓取该页面以及除相关文章以外的所有内容.

If I Inspect Element in Chrome or Safari I can see the <div id="outer_related_articles"> and all the articles listed. If I attempt to grab it with BeautifulSoup it will grab the page and everything except the related articles.

这是我到目前为止所拥有的:

Here's what I have so far:

import urllib2
from bs4 import BeautifulSoup
url = "http://support.apple.com/kb/TS1538"
response = urllib2.urlopen(url)
soup = BeautifulSoup(response.read())
print soup

推荐答案

此部分使用Java脚本加载.禁用浏览器的Javascript以查看BeautifulSoup如何看到"页面.

This section is loaded using Javascript. Disable your browser's Javascript to see how BeautifulSoup "sees" the page.

从这里开始,您有两个选择:

From here you have two options:

经过一番挖掘后,似乎请求了此地址(

After some digging it seems it does a request to this address (http://km.support.apple.com/kb/index?page=kmdata&requestid=2&query=iOS%3A%20Device%20not%20recognized%20in%20iTunes%20for%20Windows&locale=en_US&src=support_site.related_articles.TS1538&excludeids=TS1538&callback=KmLoader.receiveSuccess) and uses JSONP to load the results with KmLoader.receiveSuccess being the name of the receiving function. Use Firebug of Chrome dev tools to inspect the page in more detail.

这篇关于BeautifulSoup无法获取动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆