修复 PHP 中的相对链接 [英] Fixing relative links in PHP

查看:41
本文介绍了修复 PHP 中的相对链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析一个外部文档并将其中的所有链接设为绝对链接.例如:

I'm parsing an external document and making all of the links in it absolute. For instance:

    <link rel="stylesheet" type="text/css" href="/css/style.css" />

将替换为:

    <link rel="stylesheet" type="text/css" href="http://www.hostsite.com/css/style.css" />

其中 http://www.hostsite.com 是文档的基本 URL.

where http://www.hostsite.com is the base URL for the document.

这是我尝试过但失败的地方:

This is what I've tried and failed at:

    $linkfix1 = str_replace('href=\"\/', 'href=\"$url\/', $code);

网站上有几个问题与对单个 URL 字符串进行此替换有关,但我找不到任何适用于嵌入在文档中的 URL.关于如何使所有这些链接绝对化,有什么好的建议吗?

There are several questions on the site related to doing this replacement on a single URL string, but I couldn't find any that work on URLs embedded in a document. Are there any good suggestions on how to make all these links absolute?

推荐答案

公共服务公告:不要使用正则表达式重写格式化文档的元素.

Public service announcement: do not use regexes to rewrite elements of a formatted document.

执行此操作的正确方法是将文档作为实体(DOMDocumentSimpleXMLElement)加载并根据节点和值进行处理.原始解决方案也没有处理 src 标签或基本相对 URL 的解析(例如 /css/style.css).

The correct way to do this is to load the document as an entity (either DOMDocument or SimpleXMLElement) and do your processing based on nodes and values. The original solution also didn't handle src tags or resolution of base-relative URLs (e.g. /css/style.css).

这是一个最合适的解决方案,如果需要可以扩展:

Here's a mostly proper solution that could be expanded upon if need be:

# Example URL
$url = "http://www.stackoverflow.com/";

# Get the root and current directory
$pattern = "/(.*\/\/[^\/]+\/)([^?#]*\/)?/";
/*  The pattern has two groups: one for the domain (anything before
    the first two slashes, the slashes, anything until the next slash,
    and the next slash) and one for the current directory (anything
    that isn't an anchor or query string, then the last slash before
    any anchor or query string).  This yields:
    - [0]: http://stackoverflow.com/question/123412341234
    - [1]: http://stackoverflow.com/
    - [2]: question/
    We only need [0] (the entire match) and [1] (just the first group).
*/
$matches = array();
preg_match($pattern, $url, $matches);
$cd = $matches[0];
$root = $matches[1];

# Normalizes the URL on the provided element's attribute
function normalizeAttr($element, $attr){
    global $pattern, $cd, $root;
    $href = $element->getAttribute($attr);
    # If this is an external URL, ignore
    if(preg_match($pattern, $href))
        return;
    # If this is a base-relative URL, prepend the base
    elseif(substr($href, 0, 1) == '/')
        $element->setAttribute($attr, $root . substr($href, 1));
    # If this is a relative URL, prepend the current directory
    elseif(substr($href, 0, strlen($cd)) != $cd)
        $element->setAttribute($attr, $cd . $href);
}

# Load in the data, ignoring HTML5 errors
$page = new DOMDocument();
libxml_use_internal_errors(true);
$page->loadHTMLFile($url);
libxml_use_internal_errors(false);
$page->normalizeDocument();

# Normalize <link href="..."/>
foreach($page->getElementsByTagName('link') as $link)
    normalizeAttr($link, 'href');
# Normalize <a href="...">...</a>
foreach($page->getElementsByTagName('a') as $anchor)
    normalizeAttr($anchor, 'href');
# Normalize <img src="..."/>
foreach($page->getElementsByTagName('img') as $image)
    normalizeAttr($image, 'src');
# Normalize <script src="..."></script>
foreach($page->getElementsByTagName('script') as $script)
    normalizeAttr($script, 'src');

# Render normalized data
print $page->saveHTML();

这篇关于修复 PHP 中的相对链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆