分裂HTML code标签和内容 [英] Splitting up html code tags and content

查看:130
本文介绍了分裂HTML code标签和内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人有比我有关正前pressions更多的知识知道如何分割HTML code,使所有代码和所有字分开即

 < P>有些内容< A HREF =www.test.com>将链接< / A>< / P>

时的分隔是这样的:

  =阵{[0] =>中< P>中,
          [1] =>中的一些,
          [2] =>中的内容,
          [3] =>中与所述; A HREF =www.test.com'>中
          [4] =>中A
          [5] =>中通,
          [6] =>中与所述; / A>中,
          [7] =>中与所述; / P>中

我一直在使用preg_split到目前为止,并有可能成功,也成功地分裂用空白字符串或标签分割 - 但所有的内容是一个数组元素,当我E​​ED这是分裂<。 / p>

任何人都帮我吗?


解决方案

preg_split不应在这种情况下使用。尝试preg_match_all:

  $文字='&LT; P&GT;有些内容&LT; A HREF =www.test.com&gt;将链接&LT; / A&GT;&LT; / P&GT;';
preg_match_all('/&LT; ^&GT;] ++盐| [^&LT;&GT; \\ S] ++ /',$文字$令牌);
的print_r($令牌);

输出:

 阵列

    [0] =&GT;排列
        (
            [0] =&GT; &所述p为H.;
            [1] =&GT;一些
            [2] =&GT;内容
            [3] =&GT; &所述; A HREF =www.test.com&GT;
            [4] =&GT;一个
            [5] =&GT;链接
            [6] =&GT; &所述; / A&GT;
            [7] =&GT; &所述; / P&GT;
        ))

我以为你忘了包括'A''链接'在你的例子。

意识到,当你的HTML中包含&LT;或>的并不是作为开始或结束的标签,正则表达式会搞乱的东西涨得厉害! (因此警告)

Does anyone with more knowledge than me about regular expressions know how to split up html code so that all tags and all words are seperated ie.

<p>Some content <a href="www.test.com">A link</a></p>

Is seperated like this:

array = { [0]=>"<p>",
          [1]=>"Some",
          [2]=>"content",
          [3]=>"<a href='www.test.com'>,
          [4]=>"A",
          [5]=>"Link",
          [6]=>"</a>",
          [7]=>"</p>"

I've been using preg_split so far and have either successfully managed to split the string by whitespace or split by tags - but then all the content is in one array element when I eed this to be split to.

Anyone help me out?

解决方案

preg_split shouldn't be used in that case. Try preg_match_all:

$text = '<p>Some content <a href="www.test.com">A link</a></p>';
preg_match_all('/<[^>]++>|[^<>\s]++/', $text, $tokens);
print_r($tokens);

output:

Array
(
    [0] => Array
        (
            [0] => <p>
            [1] => Some
            [2] => content
            [3] => <a href="www.test.com">
            [4] => A
            [5] => link
            [6] => </a>
            [7] => </p>
        )

)

I assume you forgot to include the 'A' in 'A link' in your example.

Realize that when your HTML contains < or >'s not meant as the start or end of tags, regex will mess things up badly! (hence the warnings)

这篇关于分裂HTML code标签和内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆