使用disqus从网站检索评论 [英] Retrieve comments from website using disqus

查看:41
本文介绍了使用disqus从网站检索评论的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想写一个抓取脚本来检索cnn文章的评论.比如这篇文章:http:///www.cnn.com/2012/01/19/politics/gop-debate/index.html?hpt=hp_t1

I would like to write a scraping script to retrieve comments from cnn articles. For example, this article: http://www.cnn.com/2012/01/19/politics/gop-debate/index.html?hpt=hp_t1

我意识到 cnn 使用 disqus 进行评论讨论.由于评论加载不是基于网页的(即上一页、下一页)并且是动态的(即需要单击加载下一个 25"),我不知道如何检索本文的所有 5000 多条评论.

I realize that cnn uses disqus for their comment discussion. As the comment loading is not webpage-based (ie, prev page, next page) and is dynamic (ie, need to click "load next 25"), I have no idea how to retrieve all the 5000+ comments for this article.

有什么想法或建议吗?

非常感谢!

推荐答案

抓取选项(除了获取页面),这可能不太强大(取决于您的需求),但会提供问题的解决方案你有,就是在一个成熟的网络浏览器周围使用某种包装器,并从字面上编码使用模式并提取相关数据.由于您没有提到您知道哪种编程语言,我将举 3 个示例:1) Watir - ruby​​,2) Watin - IE &Firefox 通过 .net, 3) Selenium - IE 通过 C#/Java/Perl/PHP/Ruby/Python

The option for scraping (other then getting the page), which might be less robust (depends on you're needs) but will offer a solution for the problem you have, is to use some kind of wrapper around a full fledged web browser and literally code the usage pattern and extract the relevant data. Since you didn't mention which programming language you know, I'll give 3 examples: 1) Watir - ruby, 2) Watin - IE & Firefox via .net, 3) Selenium - IE via C#/Java/Perl/PHP/Ruby/Python

我将提供一个使用 Watin & 的小例子C#:

I'll provide a little example using Watin & C#:

IE browser = new IE();
browser.GoTo(YOUR CNN URL);
List visibleComments = Browser.List(Find.ById("dsq-comments"));
//do your scraping thing
Link moreComments = Browser.Link(Find.ByClass("dsq-paginate-append-text");
moreComments.click();
//wait util ajax ended by searching for some indicator
Browser.WaitUntilContainsText(SOME TEXT);
//do your scraping thing

注意:我不熟悉disqus,但通过循环链接和强制显示所有评论可能是更好的选择.单击我发布的部分代码,直到所有评论都可见并抓取 List 元素 dsq-comments

Notice: I'm not familiar with disqus, but it might be a better option to force all the comments to show by looping the Link & click parts of the code I posted until all the comments are visible and the scrape the List element dsq-comments

这篇关于使用disqus从网站检索评论的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆