wget用于获取Facebook个人资料/朋友页面 [英] wget for fetching Facebook profile/friend pages

查看:88
本文介绍了wget用于获取Facebook个人资料/朋友页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用"wget"获取Facebook用户的个人资料页面,但始终获得名为"browser.php"的非个人资料页面,该页面与该特定用户无关.我在浏览器中看到的个人资料页面的URL恰好具有以下格式:

I am trying to fetch facebook a user's profile page using "wget" but keep getting a non-profile page called "browser.php" which has nothing to do with that particular user. The profile page's URL as I see in the browser happens to be of the following format:

http://www.facebook.com/用户名

这就是我一直用作wget命令的参数的地方:

and that's what I have been using as the argument to the wget command:

wget http://www.facebook.com/user-name

我也对使用wget来获取用户的朋友列表感兴趣,但这甚至给了我同样的无益结果("browser.php"):

I am also interested in using wget to fetch a user's friends' list but even that is giving me the same unhelpful result ("browser.php"):

wget http://www.facebook.com/用户名?sk = friends& v = friends

有人可以告诉我我在做什么错吗?换句话说,我是否错过了wget命令的一些关键选项,或者wget根本不适合这种情况?

Could someone kindly advise me what I'm doing wrong here? In other words, am I missing out some key options for wget command or does wget not fit such a scenario at all?

任何帮助将不胜感激.

Any help will be greatly appreciated.

要向此查询添加上下文,我需要弄清楚如何使用wget从Facebook提取这些页面,因为这将有助于我编写脚本/程序以从HTML源代码中查找朋友的个人资料URL,然后查找我还希望它们对我没有联系的人进行某种选择性的爬网(当然,在Facebook的允许下).

To add context to this query, I need to figure out how to fetch these pages from Facebook using wget as it would then help me write a script/program to look up friends' profile URLs from the HTML source code and then look up some other keywords on them, etc. I am basically hoping that this would help me in doing some kind of selective-crawling (with Facebook's permission of course) of people I am not connected to.

推荐答案

首先,Facebook可能创建了某些用户代理(例如wget)无法抓取页面的条件.因此,他们将某些用户代理重定向到另一个页面,该页面可能会显示类似"不支持您的浏览器" 之类的内容.这样做是为了防止人们完全按照自己的方式做.但是,您可以使用-U参数对wget告诉wget将自己标识为其他代理(请阅读wget手册页).例如wget -U Mozilla http://....

First, Facebook have probably created a condition where certain user agents (e.g. wget) cannot crawl the pages. So they redirect certain user agents yo a different page which would probably say something like "your browser is not supported" They do that to protect people from doing exactly what you are doing. However you can tell wget to identify itself as a different agent using -U argument to wget (read the wget man page). e.g. wget -U Mozilla http://....

第二,Facebook的隐私设置很少允许您读取任何/很多信息,除非您以用户身份登录,并且可能仅以与要抓取的个人资料成为好友的用户身份登录.

Second, Facebooks privacy setting rarely allows you to read any/much information unless you are logged in as a user, and probably only as a user who is friend to the profile you are trying to scrape.

第三,有一个 Facebook API ,您需要使用它来抓取和提取来自Facebook的信息-如果您尝试以任何其他方式获取信息,则很可能违反了可接受使用"政策.

Thridly, there is an Facebook API which you need to use to crawl and extract information from facebook -- you are likely in violation of the Acceptable Use policy if you try to obtain information in any other way.

这篇关于wget用于获取Facebook个人资料/朋友页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆