Facebook调试器不会刮我的网站 [英] Facebook debugger won't scrape my site

查看:177
本文介绍了Facebook调试器不会刮我的网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建网站 http://Meer.li ,当我通过Facebook调试器运行时 - http://developers.facebook.com/tools/debug/og/object ?q = meer.li - 它找不到我的元标记。



当我看看Facebook的画面来源时,它显示剥离了我的网站版本,它改变了文档类型,没有元标记 - http://developers.facebook.com/tools/debug/og/echo?q=http%3A%2F%2Fmeer.li%2F



我在这里做错了什么?



我正在运行rails 3.2,ruby 1.9.3和整个事情在Heroku上运行一个mongo数据库。



修改



它似乎我有正确的接受标头在我的应用程序...如果我这样做在不同的意见:

 <%= request.headers [Accept]%> 

我得到:

  $ b 

如果我们做卷曲-H和正确的标题,为什么我们可以刮整个网站?为什么Facebook不刮脸我的网站?

解决方案

在调试器中尝试你的url,它说响应状态码是206意味着部分内容。



我试图卷曲url,确实我得到的响应是部分的,它不包括html,head和body标签(或他们的关闭标签),并且看起来像包含在

  $(#designs_content)中的html的jsonp响应。append 

我不知道为什么会发生这种情况,也许您的服务器会根据该请求和响应检查用户代理字符串?






编辑



我不知道这是否与Heroku有任何关系,我从未与他们合作过。
此外,我对rails一无所知,所以我无法帮助。



Wget与此无关,这是您的Web服务器的响应基于http请求的头部返回。
当您使用浏览器发出请求时,会在请求中添加一些标头,以帮助服务器找出一些事情。
如果您打开firebug或Chrome(safari等)中的开发人员工具,网络标签(他们都有)或使用网络嗅探器,您可以查看发送的标题。



为了让生活更轻松,我检查了什么是导致此问题的标题...
尝试这样:

  curlhttp://meer.li/

你会看到响应是一个jsonp,而不是整个html页面。
现在尝试一下:

  curl -H接受:text / html,application / xhtml + xml,application / xml; q = 0.9,* / *; q = 0.8http://meer.li/

您将获得页面的完整html版本。



由于Facebook,当您删除页面时,不发送接受标题的响应是不是您使用浏览器查看源代码时看到的内容。



我不知道如何解决这个问题,因为它一定是关于您的具体设置的,但现在至少你知道问题是什么。


I'm creating the site http://Meer.li and when I run it through facebook debugger - http://developers.facebook.com/tools/debug/og/object?q=meer.li - it can't find my meta-tags.

When I look at the source of what facebook scrapes, it shows a stripped down version of my site, where it has changed the doc-type and there's no meta tags - http://developers.facebook.com/tools/debug/og/echo?q=http%3A%2F%2Fmeer.li%2F.

What am I doing wrong here?

I'm running rails 3.2, ruby 1.9.3 and the whole thing is running on Heroku with a mongo database.

Edit

It seems that I do have the right accept header in my app... if I do this in the different views:

<%= request.headers["Accept"] %>

I get:

text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Why can we scrape the whole site if we do curl -H and the right headers? Why doesn't facebook scrape my site?

解决方案

Trying your url in the debugger it says that the response status code is 206 which means "Partial Content".

I tried to curl the url and indeed the response I got is partial, it's does not include the html, head and body tags (or their closing tags), and looks like jsonp response of html wrapped in

$("#designs_content").append

I'm not sure why that happens, maybe your server checks the user agent string of the requests and response according to that?


Edit

I'm not sure if this has anything to do with Heroku, I've never worked with them. Also, I know nothing about rails so I can't help with that.

Wget has nothing to do with this, it's the response that your web server returns based on the headers of the http request. When you make a request using a browser it adds some headers to the request to help the server figure out a few things. You can view the sent headers if you open firebug or the developers tools in chrome (safari, etc), in a networks tab (they all have that) or using a network sniffer.

To make life easier for you, I checked what's the header that causes this problem for you... try this:

curl "http://meer.li/"

And you'll see that the response is of a jsonp and not the entire html page. Now try this:

curl -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" "http://meer.li/"

And you'll get the full html version of your page.

Since facebook, when scrapping your page, does not send the "accept" header the response is not what you see when you view the source using the browser.

I have no idea how you can solves this since it's surely something about your specific setup, but now at least you know what the problem is.

这篇关于Facebook调试器不会刮我的网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆