刮取数据的Facebook应用程序 [英] Scraping a Facebook App for Data

查看:90
本文介绍了刮取数据的Facebook应用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个拥有丰富信息的Facebook应用程序,我想离线获取。为了做到这一点,我本质上需要从Web页面读入我自己的数据库中的信息。显然,我不想手动保存页面,让我的应用程序读取页面并从中提取相关的详细信息。不幸的是,我被Facebook首先认证的要求阻碍了。所以当我运行这个代码:

I'm using a Facebook application that has a rich set of information that I'd like to get at offline. To do this, I essentially need to read the infromation from the web pages into my own database. Obviously, I'd prefer not to have to save pages manually and let my application read the pages and pull the relevant details from them. Unfortunately, I am road-blocked by the requirement to authenticate to Facebook first. So when I run this code:

private static string getPage(string pageAddress)
{
    HttpWebRequest req = (HttpWebRequest)WebRequest.Create(new Uri(baseUri, pageAddress));
    HttpWebResponse response = (HttpWebResponse)req.GetResponse();
    StreamReader readStream = new StreamReader(response.GetResponseStream());
    string page = readStream.ReadToEnd();
    readStream.Close();
    response.Close(); // I know, I'm paranoid and this is likely redundant...
    return page;
}

我收到此回复:

<script type="text/javascript">
if (parent != self) 
top.location.href = "http://www.facebook.com/login.php?api_key=<obscured>&canvas&v=1.0";
else self.location.href = "http://www.facebook.com/login.php?api_key=<obscured>&canvas&v=1.0";
</script>

任何想法如何告诉应用程序,我真的正宗的我?

Any ideas how to tell the app that I really am the authentic me?

推荐答案

据我所知,您只需要登录到Facebook应用程序,对吧?使用任何Web抓取/爬网框架(它们支持JS,cookies等)。他们只是模仿我们的网页浏览。例如,尝试这些:

As far as I understood you just need to login to facebook appliction, right? Use any web scraping/crawling framework for it (they support JS, cookies, etc.). They just emulate usuall web browsing. For example, try these:

http://scrapy.org/

http:// wwwsearch。 sourceforge.net/mechanize/

http: //watin.sourceforge.net/

另请参阅

.Net屏幕抓取和会话

这篇关于刮取数据的Facebook应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆