简单的Wiki解析器和链接自动检测 [英] Simple Wiki Parser And Link Autodetection
问题描述
我正在使用以下功能:
function MakeLinks($source){
return preg_replace('!(((f|ht){1}tp://)[-a-zA-Zа-яА-Я()0-9@:%_+.~#?&;//=]+)!i', '<a href="/1">$1</a>', $source);
}
function simpleWiki($text){
$text = preg_replace('/\[\[Image:(.*)\]\]/', '<a href="$1"><img src="$1" /></a>', $text);
return $text;
}
第一个将http://example.com
转换为 http://example.com 链接.
第二个函数将像[[Image:http://example.com/logo.png]]
这样的字符串转换为图像.
The second function turns strings like [[Image:http://example.com/logo.png]]
into an image.
现在,如果我有文字
$text = 'this is my image [[Image:http://example.com/logo.png]]';
并像这样转换它simpleWiki(makeLinks($text))
,它输出类似于:
and convert it like this simpleWiki(makeLinks($text))
it outputs something similar to:
this is my image <a href="url"><img src="<a href="url">url</a>"/></a>
如何防止这种情况?如何检查URL是否不是[[Image:URL]]
结构的一部分?
How can I prevent this? How to check that the URL is not part of a [[Image:URL]]
construction?
推荐答案
您的紧迫问题可以通过将两个表达式组合为一个(使用两个替代方法),然后使用不太知名的-但是功能非常强大:preg_replace_callback()
函数可以一次遍历目标字符串分别处理每种情况,如下所示:
Your immediate problem can be solved by combining the two expressions into one (with two alternatives) and then using the not-so-well-known-but-very-powerful: preg_replace_callback()
function which handles each case separately in one pass through the target string like so:
<?php // test.php 20110312_1200
$data = "[[Image:http://example.com/logo1.png]]\n".
"http://example1.com\n".
"[[Image:http://example.com/logo2.png]]\n".
"http://example2.com\n";
$re = '!# Capture WikiImage URLs in $1 and other URLs in $2.
# Either $1: WikiImage URL
\[\[Image:(.*?)\]\]
| # Or $2: Non-WikiImage URL.
(((f|ht){1}tp://)[-a-zA-Zа-яА-Я()0-9@:%_+.~#?&;//=]+)
!ixu';
$data = preg_replace_callback($re, '_my_callback', $data);
// The callback function is called once for each
// match found and is passed one parameter: $matches.
function _my_callback($matches)
{ // Either $1 or $2 matched, but never both.
if ($matches[1]) { // $1: WikiImage URL
return '<a href="'. $matches[1] .
'"><img src="'. $matches[1] .'" /></a>';
}
else { // $2: Non-WikiImage URL.
return '<a href="'. $matches[2] .
'">'. $matches[2] .'</a>';
}
}
echo($data);
?>
此脚本实现您的两个正则表达式,并执行您要的操作.请注意,我确实将贪婪的(.*)
更改为(.*?)
惰性版本,因为贪婪的版本无法正常工作(它无法处理多个WikiImage).我还向正则表达式添加了'u'
修饰符(当模式包含Unicode字符时需要使用此修饰符).如您所见,preg回调函数非常强大. (此技术可用于在文本处理方面进行一些相当繁重的工作.)
This script implements your two regexes and does what you are asking. Note that I did change the greedy (.*)
to the (.*?)
lazy version because the greedy version does not work correctly (it fails to handle multiple WikiImages). I also added the 'u'
modifier to the regex (which is needed when a pattern contains Unicode characters). As you can see, the preg callback function is very powerful. (This technique can be used to do some pretty heavy lifting, text-processing-wise.)
但是,请注意,用于挑选URL的正则表达式可以得到显着改善.请查看以下资源,以获取有关链接" URL的更多信息(提示:有一堆陷阱"):
URL的问题
一种改进的自由,准确的正则表达式匹配网址
URL链接(HTTP/FTP)
However, please note that the regex you are using to pick out URLs can be significantly improved. Check out the following resources for more information on "Linkifying" URLs (Hint: there are a bunch of "gotchas"):
The Problem With URLs
An Improved Liberal, Accurate Regex Pattern for Matching URLs
URL Linkification (HTTP/FTP)
这篇关于简单的Wiki解析器和链接自动检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!