抓取网站中的动态内容 [英] Scraping dynamic content in a website

查看:68
本文介绍了抓取网站中的动态内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从此网站链接中抓取新闻公告. 公告似乎是动态生成的.它们不会出现在源代码中.我通常使用机械化,但我认为它不会起作用.我该怎么办?我可以使用python或perl.

I need to scrape news announcements from this website, Link. The announcements seem to be generated dynamically. They dont appear in the source. I usually use mechanize but I assume it wouldnt work. What can I do for this? I'm ok with python or perl.

推荐答案

有礼貌的选择是询问网站所有者是否具有允许您访问其新闻故事的API.

The polite option would be to ask the owners of the site if they have an API which allows you access to their news stories.

不太礼貌的选择是跟踪页面加载时发生的HTTP事务,并确定哪个是AJAX调用来提取数据.

The less polite option would be to trace the HTTP transactions that take place while the page is loading and work out which one is the AJAX call which pulls in the data.

好像是.但是看起来它可能包含会话数据,所以我不知道它将持续工作多长时间.

Looks like it's this one. But it looks like it might contain session data, so I don't know how long it will continue to work for.

这篇关于抓取网站中的动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆