简单的Wiki解析器和链接自动检测 [英] Simple Wiki Parser And Link Autodetection

查看:90
本文介绍了简单的Wiki解析器和链接自动检测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下功能:

function MakeLinks($source){
 return preg_replace('!(((f|ht){1}tp://)[-a-zA-Zа-яА-Я()0-9@:%_+.~#?&;//=]+)!i', '<a href="/1">$1</a>', $source);
}

function simpleWiki($text){
 $text = preg_replace('/\[\[Image:(.*)\]\]/', '<a href="$1"><img src="$1" /></a>', $text);
 return $text;
}

第一个将http://example.com转换为 http://example.com 链接.

第二个函数将像[[Image:http://example.com/logo.png]]这样的字符串转换为图像.

The second function turns strings like [[Image:http://example.com/logo.png]] into an image.

现在,如果我有文字

$text = 'this is my image [[Image:http://example.com/logo.png]]';

并像这样转换它simpleWiki(makeLinks($text)),它输出类似于:

and convert it like this simpleWiki(makeLinks($text)) it outputs something similar to:

this is my image <a href="url"><img src="<a href="url">url</a>"/></a>

如何防止这种情况?如何检查URL是否不是[[Image:URL]]结构的一部分?

How can I prevent this? How to check that the URL is not part of a [[Image:URL]] construction?

推荐答案

您的紧迫问题可以通过将两个表达式组合为一个(使用两个替代方法),然后使用不太知名的-但是功能非常强大:preg_replace_callback()函数可以一次遍历目标字符串分别处理每种情况,如下所示:

Your immediate problem can be solved by combining the two expressions into one (with two alternatives) and then using the not-so-well-known-but-very-powerful: preg_replace_callback() function which handles each case separately in one pass through the target string like so:

<?php // test.php 20110312_1200
$data = "[[Image:http://example.com/logo1.png]]\n".
        "http://example1.com\n".
        "[[Image:http://example.com/logo2.png]]\n".
        "http://example2.com\n";

$re = '!# Capture WikiImage URLs in $1 and other URLs in $2.
      # Either $1: WikiImage URL
      \[\[Image:(.*?)\]\]
    | # Or $2: Non-WikiImage URL.
      (((f|ht){1}tp://)[-a-zA-Zа-яА-Я()0-9@:%_+.~#?&;//=]+)
      !ixu';

$data = preg_replace_callback($re, '_my_callback', $data);

// The callback function is called once for each
// match found and is passed one parameter: $matches.
function _my_callback($matches)
{ // Either $1 or $2 matched, but never both.
    if ($matches[1]) {  // $1: WikiImage URL
        return '<a href="'. $matches[1] .
            '"><img src="'. $matches[1] .'" /></a>';
    }
    else {              // $2: Non-WikiImage URL.
        return '<a href="'. $matches[2] .
            '">'. $matches[2] .'</a>';
    }
}
echo($data);
?>

此脚本实现您的两个正则表达式,并执行您要的操作.请注意,我确实将贪婪的(.*)更改为(.*?)惰性版本,因为贪婪的版本无法正常工作(它无法处理多个WikiImage).我还向正则表达式添加了'u'修饰符(当模式包含Unicode字符时需要使用此修饰符).如您所见,preg回调函数非常强大. (此技术可用于在文本处理方面进行一些相当繁重的工作.)

This script implements your two regexes and does what you are asking. Note that I did change the greedy (.*) to the (.*?) lazy version because the greedy version does not work correctly (it fails to handle multiple WikiImages). I also added the 'u' modifier to the regex (which is needed when a pattern contains Unicode characters). As you can see, the preg callback function is very powerful. (This technique can be used to do some pretty heavy lifting, text-processing-wise.)

但是,请注意,用于挑选URL的正则表达式可以得到显着改善.请查看以下资源,以获取有关链接" URL的更多信息(提示:有一堆陷阱"):
URL的问题
一种改进的自由,准确的正则表达式匹配网址
URL链接(HTTP/FTP)

However, please note that the regex you are using to pick out URLs can be significantly improved. Check out the following resources for more information on "Linkifying" URLs (Hint: there are a bunch of "gotchas"):
The Problem With URLs
An Improved Liberal, Accurate Regex Pattern for Matching URLs
URL Linkification (HTTP/FTP)

这篇关于简单的Wiki解析器和链接自动检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆