用HTML链接替换文本中的URL [英] Replace URLs in text with HTML links

查看:108
本文介绍了用HTML链接替换文本中的URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个设计:例如,我放了一个链接,例如

Here is a design though: For example is I put a link such as

http://example.com

文本区域中.如何使PHP检测到它是http://链接,然后将其打印为

in textarea. How do I get PHP to detect it’s a http:// link and then print it as

print "<a href='http://www.example.com'>http://www.example.com</a>";

我记得以前做过类似的事情,但是,这并不是傻瓜式的证明它会不断破坏复杂的链接.

另一个好主意是,如果您有一个链接,例如

Another good idea would be if you have a link such as

http://example.com/test.php? val1 = bla& val2blablabla%20bla%20bla.bl

修复它,以使其做到

print "<a href='http://example.com/test.php?val1=bla&val2=bla%20bla%20bla.bla'>";
print "http://example.com/test.php";
print "</a>";

这只是一个事后的想法.. stackoverflow也可能会使用它:D

This one is just an after thought.. stackoverflow could also probably use this as well :D

任何想法

推荐答案

让我们看一下需求.您有一些用户提供的纯文本,您希望使用超链接的URL进行显示.

Let's look at the requirements. You have some user-supplied plain text, which you want to display with hyperlinked URLs.

  1. "http://"协议前缀应该是可选的.
  2. 应该接受域和IP地址.
  3. 任何有效的顶级域都应被接受,例如.aero和.xn--jxalpdlp.
  4. 应允许使用端口号.
  5. 在正常句子上下文中必须允许使用URL.例如,在访问stackoverflow.com."中,最后一个时期不是URL的一部分.
  6. 您可能还希望允许"https://" URL,也可能允许其他URL.
  7. 与以往一样,当以HTML显示用户提供的文本时,您要防止跨站点脚本(XSS).此外,您还希望正确转义的URL中的与符号 & ..
  8. 您可能不需要对IPv6地址的支持.
  9. 编辑:如评论中所述,绝对支持电子邮件地址.
  10. 编辑:仅支持纯文本输入-输入中的HTML标记不应被保留. (Bitbucket版本支持HTML输入.)
  1. The "http://" protocol prefix should be optional.
  2. Both domains and IP addresses should be accepted.
  3. Any valid top-level domain should be accepted, e.g. .aero and .xn--jxalpdlp.
  4. Port numbers should be allowed.
  5. URLs must be allowed in normal sentence contexts. For instance, in "Visit stackoverflow.com.", the final period is not part of the URL.
  6. You probably want to allow "https://" URLs as well, and perhaps others as well.
  7. As always when displaying user supplied text in HTML, you want to prevent cross-site scripting (XSS). Also, you'll want ampersands in URLs to be correctly escaped as &amp;.
  8. You probably don't need support for IPv6 addresses.
  9. Edit: As noted in the comments, support for email-adresses is definitely a plus.
  10. Edit: Only plain text input is to be supported – HTML tags in the input should not be honoured. (The Bitbucket version supports HTML input.)

编辑:查看 GitHub 以获取最新版本,支持电子邮件地址,经过身份验证的URL,带引号和括号的URL,HTML输入以及更新的TLD列表.

Edit: Check out GitHub for the latest version, with support for email addresses, authenticated URLs, URLs in quotes and parentheses, HTML input, as well as an updated TLD list.

这是我的看法:

<?php
$text = <<<EOD
Here are some URLs:
stackoverflow.com/questions/1188129/pregreplace-to-detect-html-php
Here's the answer: http://www.google.com/search?rls=en&q=42&ie=utf-8&oe=utf-8&hl=en. What was the question?
A quick look at http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax is helpful.
There is no place like 127.0.0.1! Except maybe http://news.bbc.co.uk/1/hi/england/surrey/8168892.stm?
Ports: 192.168.0.1:8080, https://example.net:1234/.
Beware of Greeks bringing internationalized top-level domains: xn--hxajbheg2az3al.xn--jxalpdlp.
And remember.Nobody is perfect.

<script>alert('Remember kids: Say no to XSS-attacks! Always HTML escape untrusted input!');</script>
EOD;

$rexProtocol = '(https?://)?';
$rexDomain   = '((?:[-a-zA-Z0-9]{1,63}\.)+[-a-zA-Z0-9]{2,63}|(?:[0-9]{1,3}\.){3}[0-9]{1,3})';
$rexPort     = '(:[0-9]{1,5})?';
$rexPath     = '(/[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]*?)?';
$rexQuery    = '(\?[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?';
$rexFragment = '(#[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?';

// Solution 1:

function callback($match)
{
    // Prepend http:// if no protocol specified
    $completeUrl = $match[1] ? $match[0] : "http://{$match[0]}";

    return '<a href="' . $completeUrl . '">'
        . $match[2] . $match[3] . $match[4] . '</a>';
}

print "<pre>";
print preg_replace_callback("&\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))&",
    'callback', htmlspecialchars($text));
print "</pre>";

  • 要正确逃脱<和&字符,我在处理之前将整个文本通过htmlspecialchars抛出.这是不理想的,因为html的转义会导致对URL边界的错误检测.
  • 请记住,没有人是完美的".行(记住这一点.由于缺少空格,没有人被视为URL),因此可能需要进一步检查有效的顶级域.
  • 编辑:以下代码解决了上述两个问题,但由于我或多或少地使用preg_match重新实现了preg_replace_callback,所以代码的含义更为冗长.

    Edit: The following code fixes the above two problems, but is quite a bit more verbose since I'm more or less re-implementing preg_replace_callback using preg_match.

    // Solution 2:
    
    $validTlds = array_fill_keys(explode(" ", ".aero .asia .biz .cat .com .coop .edu .gov .info .int .jobs .mil .mobi .museum .name .net .org .pro .tel .travel .ac .ad .ae .af .ag .ai .al .am .an .ao .aq .ar .as .at .au .aw .ax .az .ba .bb .bd .be .bf .bg .bh .bi .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gn .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .io .iq .ir .is .it .je .jm .jo .jp .ke .kg .kh .ki .km .kn .kp .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .me .mg .mh .mk .ml .mm .mn .mo .mp .mq .mr .ms .mt .mu .mv .mw .mx .my .mz .na .nc .ne .nf .ng .ni .nl .no .np .nr .nu .nz .om .pa .pe .pf .pg .ph .pk .pl .pm .pn .pr .ps .pt .pw .py .qa .re .ro .rs .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .tt .tv .tw .tz .ua .ug .uk .us .uy .uz .va .vc .ve .vg .vi .vn .vu .wf .ws .ye .yt .yu .za .zm .zw .xn--0zwm56d .xn--11b5bs3a9aj6g .xn--80akhbyknj4f .xn--9t4b11yi5a .xn--deba0ad .xn--g6w251d .xn--hgbk6aj7f53bba .xn--hlcj6aya9esc7a .xn--jxalpdlp .xn--kgbechtv .xn--zckzah .arpa"), true);
    
    $position = 0;
    while (preg_match("{\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))}", $text, &$match, PREG_OFFSET_CAPTURE, $position))
    {
        list($url, $urlPosition) = $match[0];
    
        // Print the text leading up to the URL.
        print(htmlspecialchars(substr($text, $position, $urlPosition - $position)));
    
        $domain = $match[2][0];
        $port   = $match[3][0];
        $path   = $match[4][0];
    
        // Check if the TLD is valid - or that $domain is an IP address.
        $tld = strtolower(strrchr($domain, '.'));
        if (preg_match('{\.[0-9]{1,3}}', $tld) || isset($validTlds[$tld]))
        {
            // Prepend http:// if no protocol specified
            $completeUrl = $match[1][0] ? $url : "http://$url";
    
            // Print the hyperlink.
            printf('<a href="%s">%s</a>', htmlspecialchars($completeUrl), htmlspecialchars("$domain$port$path"));
        }
        else
        {
            // Not a valid URL.
            print(htmlspecialchars($url));
        }
    
        // Continue text parsing from after the URL.
        $position = $urlPosition + strlen($url);
    }
    
    // Print the remainder of the text.
    print(htmlspecialchars(substr($text, $position)));
    

    这篇关于用HTML链接替换文本中的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆