PHP正则表达式确定相对或绝对路径 [英] PHP Regex to determine relative or absolute path

查看:29
本文介绍了PHP正则表达式确定相对或绝对路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 cURL 来拉取远程站点的内容.我需要检查所有href="属性并确定它们是相对路径还是绝对路径,然后获取链接的值并将其路径为 href="http://www.website.com/index.php?url=[绝对路径]"

I'm using cURL to pull the contents of a remote site. I need to check all "href=" attributes and determine if they're relative or absolute path, then get the value of the link and path it to something like href="http://www.website.com/index.php?url=[ABSOLUTE_PATH]"

任何帮助将不胜感激.

推荐答案

正则表达式* 和 HTML 的 parse_url 的组合() 应该有帮助:

A combination of a regex* and HTML's parse_url() should help:

// find all links in a page used within href="" or href='' syntax
$links = array();
preg_match_all('/href=(?:(?:"([^"]+)")|(?:\'([^\']+)\'))/i', $page_contents, $links);

// iterate through each array and check if it's "absolute"
$urls = array();
foreach ($links as $link) {
    $path = $link;
    if ((substr($link, 0, 7) == 'http://') || (substr($link, 0, 8) == 'https://')) {
        // the current link is an "absolute" URL - parse it to get just the path
        $parsed = parse_url($link);
        $path = $parsed['path'];
    }
    $urls[] = 'http://www.website.com/index.php?url=' . $path;
}

要确定 URL 是否是绝对的,我只需检查 URL 的开头是 http:// 还是 https://;如果您的 URL 包含其他媒体,例如 ftp://tel:,您可能也需要处理这些.

To determine if the URL is absolute or not, I simply have it check if the beginning of the URL is http:// or https://; if your URLs contain other mediums such as ftp:// or tel:, you might need to handle those as well.

这个解决方案确实使用了正则表达式来解析 HTML,这通常是不受欢迎的.为了规避,您可以切换到使用 [DOMDocument][2],但如果没有任何问题,则不需要额外的代码.

This solution does use regex to parse HTML, which is often frowned upon. To circumvent, you could switch to using [DOMDocument][2], but there's no need for extra code if there aren't any issues.

这篇关于PHP正则表达式确定相对或绝对路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆