PHP preg_split在空格上,但不在标签内 [英] PHP preg_split on spaces, but not within tags
问题描述
preg_split(/ \[^ \] * \(* SKIP)(* F)| \x20 /,$ input_line); 并在 phpliveregex.com 上运行
它会产生数组:
数组(10
0 =>< b>测试< / b>
1 =>或
2 =>< em> oh
3 => yeah< / em>
4 =>和
5 =>< i>
6 => ;
7 => yeah
8 =>< / i>
9 =>you we'hold it it
不是我想要的,它应该由仅在html标签之外的空格分隔,如下所示:
array(5
0 =>< b>测试< / b>
1 =>或
2 =>< em> oh yeah< / em>
3 =>和
4 => i oh yeah
5 =>ye我们'持有'它
b
$ p $在这个正则表达式我只能添加异常双引号,但真的需要帮助来添加更多内容,如标签< img /><一个>< / A><预>< /预><代码>< /代码><强>< /强>< b取代;< / B>< EM>< / em>< i>< / i>
因为你不需要描述一个html标签是什么,所以使用 DOMDocument
会更容易一些。 以及它的外观。你只需要检查nodeType。当它是一个textNode时,用 preg_match_all
拆分它(它比为 preg_split
设计模式更方便) :
$ html ='文字节点中的空格< b>测试< / b>或< em>噢是的< / em>和< i>哦是的< / i>
ye we \'hold\'it
最后未封闭的双引号;
$ dom =新的DOMDocument;
$ dom- > loadHTML('< div>。$ html。'< / div>',LIBXML_HTML_NOIMPLIED);
$ nodeList = $ dom-> documentElement-> childNodes;
$ results = [];
foreach($ nodeList as $ childNode){
if($ childNode-> nodeType == XML_TEXT_NODE&&
preg_match_all('〜[^ \s] + |[^] *?〜',$ childNode-> nodeValue,$ m))
$ results = array_merge($ results,$ m [$ 0]);
else
$ results [] = $ dom-> saveHTML($ childNode);
}
print_r($ results);
注意:当双引号部分保持未关闭时,我选择了一个默认行为(没有关闭注意2:有时候 LIBXML _
常量没有被定义。您可以解决此问题,然后在需要时定义它:
if(!defined('LIBXML_HTML_NOIMPLIED'))
define('LIBXML_HTML_NOIMPLIED',8192);
i am using preg_split("/\"[^\"]*\"(*SKIP)(*F)|\x20/", $input_line);
and run it on phpliveregex.com
it produce array :
array(10
0=><b>test</b>
1=>or
2=><em>oh
3=>yeah</em>
4=>and
5=><i>
6=>oh
7=>yeah
8=></i>
9=>"ye we 'hold' it"
)
NOT what i want, it should be seperate by spaces only outside html tags like this:
array(5
0=><b>test</b>
1=>or
2=><em>oh yeah</em>
3=>and
4=><i>oh yeah</i>
5=>"ye we 'hold' it"
)
in this regex i am only can add exception in "double quote" but realy need help to add more, like tag <img/><a></a><pre></pre><code></code><strong></strong><b></b><em></em><i></i>
any explanation about how that regex works also appreciate.
解决方案 It's easier to use the DOMDocument
since you don't need to describe what a html tag is and how it looks. You only need to check the nodeType. When it's a textNode, split it with preg_match_all
(it's more handy than to design a pattern for preg_split
):
$html = 'spaces in a text node <b>test</b> or <em>oh yeah</em> and <i>oh yeah</i>
"ye we \'hold\' it"
"unclosed double quotes at the end';
$dom = new DOMDocument;
$dom->loadHTML('<div>' . $html . '</div>', LIBXML_HTML_NOIMPLIED);
$nodeList = $dom->documentElement->childNodes;
$results = [];
foreach ($nodeList as $childNode) {
if ($childNode->nodeType == XML_TEXT_NODE &&
preg_match_all('~[^\s"]+|"[^"]*"?~', $childNode->nodeValue, $m))
$results = array_merge($results, $m[0]);
else
$results[] = $dom->saveHTML($childNode);
}
print_r($results);
Note: I have chosen a default behaviour when a double quote part stays unclosed (without a closing quote), feel free to change it.
Note2: Sometimes LIBXML_
constants are not defined. You can solve this problem testing it before and defining it when needed:
if (!defined('LIBXML_HTML_NOIMPLIED'))
define('LIBXML_HTML_NOIMPLIED', 8192);
这篇关于PHP preg_split在空格上,但不在标签内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!