Facebook url在scrapy中返回移动版本的url响应 [英] Facebook url returning an mobile version url response in scrapy

查看:40
本文介绍了Facebook url在scrapy中返回移动版本的url响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,可能很奇怪,但想知道,

我试图通过scrapy使用URL www.facebook.com 访问facebook.我在 start_url 中给出了它.运行后,当我们在浏览器中打开此 url 时,我得到了响应为 http://m.facebook.com/?refsrc=http%3A%2F%2Fwww.facebook.com%2F&_rdr,我可以期待这是 facebook 的移动视图.那么为什么响应是移动视图,而不是我们在桌面上打开时能够看到的一般视图.

提前致谢.......

解决方案

有一个全局设置:USER_AGENT

更新:

你知道,也许处理移动版本毕竟是一个优势.当无法执行 javascript 时,其他站点会将浏览器重定向到其他页面:

处理网站的无 js 版本或移动版本意味着页面尺寸更小,页面上的附加信息更少 - 因此 html 不会随着时间的推移而发生太大变化,并且您的 xpath 查询仍然有效.

在这种情况下,只需在 Firefox 中禁用 JS 或在其中设置不同的 User-Agent 即可获得相同的页面.这里有更多关于如何使用 Firefox 测试抓取的提示:使用 Firefox 进行抓取

I had one question , may be its wierd , but wanna know it,

I tried to access facebook with URl www.facebook.com through scrapy. I had given it in start_url. After running i got the response as http://m.facebook.com/?refsrc=http%3A%2F%2Fwww.facebook.com%2F&_rdr , when we open this url in browser,i can expect this is mobile view for facebook. So why the response is mobile view and not the general view that we are able to see when opening on a desktop.

Thanks in advance...................

解决方案

There is a global setting for that: USER_AGENT

UPDATE :

You know, maybe dealing with the mobile version is an advantage after all. Other sites redirect browsers to other pages when no javascript can be executed:

<noscript> <meta http-equiv="refresh" content="0; URL=/homedepot?_fb_noscript=1" /> </noscript>

Dealing with a no js version or a mobile version of the site means less sizes of the pages and less additional info on the page - so the html will not change much over times, and your xpath queries will still work.

In that case just disable JS in Firefox or set a different User-Agent in it to get the same pages scrapy gets. Here are more hints on how to use Firefox for testing scrapy: Using Firefox for scraping

这篇关于Facebook url在scrapy中返回移动版本的url响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆