BeautifulSoup 不抓取动态内容 [英] BeautifulSoup not grabbing dynamic content

查看:25
本文介绍了BeautifulSoup 不抓取动态内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到的问题是我想从此页面获取相关链接:http://support.apple.com/kb/TS1538

My issue I'm having is that I want to grab the related links from this page: http://support.apple.com/kb/TS1538

如果我在 Chrome 或 Safari 中检查元素,我可以看到 <div id="outer_related_articles"> 和列出的所有文章.如果我尝试使用 BeautifulSoup 抓取它,它将抓取页面和所有除了相关文章.

If I Inspect Element in Chrome or Safari I can see the <div id="outer_related_articles"> and all the articles listed. If I attempt to grab it with BeautifulSoup it will grab the page and everything except the related articles.

这是我目前所拥有的:

import urllib2
from bs4 import BeautifulSoup
url = "http://support.apple.com/kb/TS1538"
response = urllib2.urlopen(url)
soup = BeautifulSoup(response.read())
print soup

推荐答案

此部分使用 Javascript 加载.禁用浏览器的 Javascript 以查看 BeautifulSoup 如何看到"页面.

This section is loaded using Javascript. Disable your browser's Javascript to see how BeautifulSoup "sees" the page.

从这里你有两个选择:

经过一番挖掘,它似乎向这个地址发出了请求(http://nmrefer.receiveSuccesssupport.apple.com/kb/index?page=kmdata&requestid=2&query=iOS%3A%20Device%20not%20recognized%20in%20iTunes%20for%20Windows&locale=en_US&src=support_site.related_articles.TS1538&excludeids=TS1538&callback=KmLoader.receiveSuccess) 并使用 JSONP 加载结果,KmLoader.receiveSuccess 是接收函数的名称.使用 Chrome 开发工具的 Firebug 更详细地检查页面.

After some digging it seems it does a request to this address (http://km.support.apple.com/kb/index?page=kmdata&requestid=2&query=iOS%3A%20Device%20not%20recognized%20in%20iTunes%20for%20Windows&locale=en_US&src=support_site.related_articles.TS1538&excludeids=TS1538&callback=KmLoader.receiveSuccess) and uses JSONP to load the results with KmLoader.receiveSuccess being the name of the receiving function. Use Firebug of Chrome dev tools to inspect the page in more detail.

这篇关于BeautifulSoup 不抓取动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆