PHP preg_split在空格上，但不在标签内 [英] PHP preg_split on spaces, but not within tags

查看：119 发布时间：2018/6/26 19:40:49 php html regex preg-split

本文介绍了PHP preg_split在空格上，但不在标签内的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用

 preg_split（/ \[^ \] * \（* SKIP）（* F）| \x20 /，$ input_line）; 并在 phpliveregex.com 上运行
它会产生数组： 
 
 
 数组（10 
 0 =>< b>测试< / b> 
 1 =>或
 2 =>< em> oh 
 3 => yeah< / em> 
 4 =>和
 5 =>< i> 
 6 => ; 
 7 => yeah 
 8 =>< / i> 
 9 =>you we'hold it it 
 
  
不是我想要的，它应该由仅在html标签之外的空格分隔，如下所示：
  array（5 
 0 =>< b>测试< / b> 
 1 =>或
 2 =>< em> oh yeah< / em> 
 3 =>和
 4 => i oh yeah 
 5 =>ye我们'持有'它
 b 
 
 
 $ p $在这个正则表达式我只能添加异常双引号，但真的需要帮助来添加更多内容，如标签< img /><一个>< / A><预>< /预><代码>< /代码><强>< /强>< b取代;< / B>< EM>< / em>< i>< / i>  
 
 
 因为你不需要描述一个html标签是什么，所以使用 DOMDocument 会更容易一些。  以及它的外观。你只需要检查nodeType。当它是一个textNode时，用 preg_match_all  拆分它（它比为 preg_split 设计模式更方便） ： 
 
 
  $ html ='文字节点中的空格< b>测试< / b>或< em>噢是的< / em>和< i>哦是的< / i> 
ye we \'hold\'it
最后未封闭的双引号; 
 
 $ dom =新的DOMDocument; 
 $ dom- > loadHTML（'< div>。$ html。'< / div>'，LIBXML_HTML_NOIMPLIED）; 
 
 $ nodeList = $ dom-> documentElement-> childNodes; 
 
 $ results = []; 
 
 foreach（$ nodeList as $ childNode）{
 if（$ childNode-> nodeType == XML_TEXT_NODE&& 
 preg_match_all（'〜[^ \s] + |[^] *？〜'，$ childNode-> nodeValue，$ m））
 $ results = array_merge（$ results，$ m [$ 0]）; 
 else 
 $ results [] = $ dom-> saveHTML（$ childNode）; 
} 
 
 print_r（$ results）; 
  
注意：当双引号部分保持未关闭时，我选择了一个默认行为（没有关闭注意2：有时候 LIBXML _ 常量没有被定义。您可以解决此问题，然后在需要时定义它：
  if（！defined（'LIBXML_HTML_NOIMPLIED'））
 define（'LIBXML_HTML_NOIMPLIED'，8192）; 
  
 
i am using preg_split("/\"[^\"]*\"(*SKIP)(*F)|\x20/", $input_line); and run it on phpliveregex.com
it produce array :
array(10
  0=><b>test</b>
  1=>or
  2=><em>oh
  3=>yeah</em>
  4=>and
  5=><i>
  6=>oh
  7=>yeah
  8=></i>
  9=>"ye we 'hold' it"
)
NOT what i want, it should be seperate by spaces only outside html tags like this:
array(5
  0=><b>test</b>
  1=>or
  2=><em>oh yeah</em>
  3=>and
  4=><i>oh yeah</i>
  5=>"ye we 'hold' it"
)
in this regex i am only can add exception in "double quote" but realy need help to add more, like tag <img/><a></a><pre></pre><code></code><strong></strong><b></b><em></em><i></i>


any explanation about how that regex works also appreciate.
 解决方案 
It's easier to use the DOMDocument since you don't need to describe what a html tag is and how it looks. You only need to check the nodeType. When it's a textNode, split it with preg_match_all (it's more handy than to design a pattern for preg_split):
$html = 'spaces in a text node <b>test</b> or <em>oh yeah</em> and <i>oh yeah</i>
"ye we \'hold\' it"
"unclosed double quotes at the end';

$dom = new DOMDocument;
$dom->loadHTML('<div>' . $html . '</div>', LIBXML_HTML_NOIMPLIED);

$nodeList = $dom->documentElement->childNodes;

$results = [];

foreach ($nodeList as $childNode) {
    if ($childNode->nodeType == XML_TEXT_NODE &&
        preg_match_all('~[^\s"]+|"[^"]*"?~', $childNode->nodeValue, $m))
        $results = array_merge($results, $m[0]);
    else
        $results[] = $dom->saveHTML($childNode);
}

print_r($results);
Note: I have chosen a default behaviour when a double quote part stays unclosed (without a closing quote), feel free to change it.

Note2: Sometimes LIBXML_ constants are not defined. You can solve this problem testing it before and defining it when needed:
if (!defined('LIBXML_HTML_NOIMPLIED'))
    define('LIBXML_HTML_NOIMPLIED', 8192);


                        
这篇关于PHP preg_split在空格上，但不在标签内的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

PHP preg_split在空格上，但不在标签内 [英] PHP preg_split on spaces, but not within tags

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

PHP preg_split在空格上，但不在标签内 [英] PHP preg_split on spaces, but not within tags

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭