如何使用 PHP 获取网站的最终、重定向、规范 URL? [英] How do I get the final, redirected, canonical URL of a website using PHP?

查看:53
本文介绍了如何使用 PHP 获取网站的最终、重定向、规范 URL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在链接缩短器和 Ajax 的时代,可能会有许多链接最终指向相同的内容.我想知道最好的方法是获得最终的、最好的 PHP 网站链接,希望有一个库.我在 Google 或 GitHub 上找不到任何内容.

In the days of link shorteners and Ajax, there can be many links that ultimately point to the same content. I was wondering what the best way is to get the final, best link for a web site in PHP, hopefully with a library. I was unable to find anything on Google or GitHub.

我看过这个示例代码,但它不处理诸如 rel="canonical" 元标记或默认 ssl 端口之类的事情:http://w-shadow.com/blog/2008/07/05/how-to-get-重定向 url-in-php/

I have seen this example code, but it doesn't handle things like a rel="canonical" meta tags or default ssl ports: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/

Facebook 似乎处理得很好,您可以看到他们如何遵循 301 和 rel="canonical" 等.要查看 Facebook 处理方式的示例,请使用他们的 Open Graph 工具:

Facebook seems to handle this pretty well, you can see how they follow 301's and rel="canonical", etc. To see examples of the way Facebook handles it, use their Open Graph tool:

https://developers.facebook.com/tools/debug

并输入这些链接:

http://dlvr.it/xxb0W
https://twitter.com/#!/twitter/statuses/136946408275193856

http://dlvr.it/xxb0W
https://twitter.com/#!/twitter/statuses/136946408275193856

是否有已经预先构建的 PHP 库,它将检查这些标头、解析 301 重定向、解析 rel="canonical"、检测重定向循环并正确获取要使用的最佳结果 URL?

Is there a PHP library out there that already has this pre-built, where it will check for these headers, resolve 301 redirects, parse rel="canonical", detect redirect loops and properly just grab the best resulting URL to use?

作为替代方案,我对可以使用的 API 持开放态度,但更喜欢在我自己的服务器上运行的 API.

As an alternative, I am open to APIs that can be used, but would prefer something that runs on my own server.

推荐答案

由于我找不到任何真正满足我的需求的库,而且我希望做的不仅仅是遵循 HTTP 重定向,我已经继续创建了一个实现目标的库,并在 MIT 许可下发布了它.你可以在这里得到它:

Since I wasn't able to find any libraries that really did what I was looking for, and I was hoping to do more than just follow HTTP redirects, I have gone ahead and created a library that accomplishes the goals and released it under the MIT license. You can get it here:

https://github.com/mattwright/URLResolver.php

URLResolver.php 是一个 PHP 类,它尝试将 URL 解析为最终的规范链接:

URLResolver.php is a PHP class that attempts to resolve URLs to a final, canonical link:

  • 遵循 HTTP 标头中的 301 和 302 重定向
  • 遵循开放图谱 URL <meta>在网页中找到的标签
  • 遵循规范 URL <链接>在网页中找到的标签
  • 如果内容类型不是 HTML 页面,则快速中止下载

我当然不是 HTTP 重定向规则方面的专家,所以如果有人对如何改进这个库有任何建议,我们将不胜感激.我已经在数千个 URL 上进行了测试,它似乎做得很好.我遵循了 Mario 的建议,并在需要时使用了 PHP Simple HTML Parser 库.

I am certainly not an expert on the rules of HTTP redirection, so if anyone has suggestions on how to improve this library, it would be greatly appreciated. I have tested in on thousands of URLs and it seems to do pretty well. I followed Mario's advice and used PHP Simple HTML Parser library where needed.

这篇关于如何使用 PHP 获取网站的最终、重定向、规范 URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆