删除括号内的字符串 [英] Removing string inside brackets

查看:54
本文介绍了删除括号内的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

美好的一天!

我需要一些帮助来删除方括号内的字符串并包括方括号.

字符串如下所示:

$string = "Lorem ipsum dolor
[上下文可在 www.example.com 上找到] <br/>这里有一些文字.文字在这里.[测试] Lorem ipsum dolor.";

我只想删除包含www.example.com"的括号及其内容.我想在字符串中保留 "[test]" 并且任何其他括号中都没有 "www.example.com" .

谢谢!

解决方案

注意: OP 极大地改变了问题.此解决方案旨在以其原始(更困难)形式(在添加www.example.com"约束之前)处理问题.尽管已修改以下解决方案以处理此附加约束,但现在可能更简单的解决方案足够了(即 anubhava 的回答).

这是我经过测试的解决方案:

function strip_bracketed_special($text) {$re = '% # 删除标记中包含www.example.com"的括号文本.# 跳过评论、CDATA、SCRIPT &STYLE 元素和 HTML 标签.( # $1: HTML 的东西不要管.<!--.*?--># HTML 注释(非 SGML 兼容).|<!\[CDATA\[.*?\]\]># CDATA 部分|<script.*?</script># 脚本元素.|<style.*?</style># 样式元素.|<\w+ # HTML 元素开始标签.(?: # 组可选属性.\s+ # 以空格分隔的属性.[\w:.-]+ # 属性名称为必填项(?: # 可选属性值的组.\s*=\s* # 名称和值以="分隔(?: # Group for value替代品."[^"]*" # 双引号字符串,|\'[^\']*\' # 或单引号字符串,|[\w:.-]+ # 或不带引号的字符串(有限的字符).) # 结束一组价值替代品.)?# 属性值是可选的.)* # 零个或多个开始标签属性.\s*/?># 开始标签结束(可选的自关闭).|</\w+># HTML 元素结束标记.) # 结束 #1: HTML 的东西不要管.|# 或者...包含 www.example.com 的括号结构\s*\[ #(可选 ws),左括号.[^\]]*?# 匹配所需的内容.www\.example\.com # 需要括号内的内容.[^\]]* # 匹配右括号.\]\s* # 右括号,(可选 ws).%六';返回 preg_replace($re, '$1', $text);}

请注意,正则表达式会跳过从以下内容中删除括号内的内容:HTML 注释、CDATA 部分、SCRIPT 和 STYLE 元素以及 HTML 标记属性值中的内容.给定以下 XHTML 标记(测试这些场景),上述函数仅正确删除 html 元素内容中的括号内容:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><头><title>测试特殊删除.[删除此 www.example.com]</title><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/><style type="text/css">.test.before {内容:[不要删除 www.example.com]";}</风格><script type="text/javascript">//<![CDATA[ ["不要删除 www.example.com"] ]]>var ob = {};ob["不要删除 www.example.com"] = "stuff";var str = "[不要删除 www.example.com]";<身体><!-- <![CDATA[ ["不要删除 www.example.com"] ]]>--><div title="[不要删除 www.example.com]"><h1>测试特殊去除.[删除此 www.example.com]</h1><p>测试特殊去除.[删除此 www.example.com]</p><p onclick='var str = "[不要删除 www.example.com]";return false;'>测试特殊去除.[不要删除这个]测试特殊去除.[删除这个 www.example.com]</p>

</html>

这是通过上面的 PHP 函数运行后的相同标记:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><头><title>测试特殊删除.</title><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/><style type="text/css">.test.before {内容:[不要删除 www.example.com]";}</风格><script type="text/javascript">//<![CDATA[ ["不要删除 www.example.com"] ]]>var ob = {};ob["不要删除 www.example.com"] = "stuff";var str = "[不要删除 www.example.com]";<身体><!-- <![CDATA[ ["不要删除 www.example.com"] ]]>--><div title="[不要删除 www.example.com]"><h1>测试特殊移除.</h1><p>测试特殊移除.</p><p onclick='var str = "[不要删除 www.example.com]";return false;'>测试特殊去除.[不要删除这个]测试特殊移除.</p>

</html>

这个解决方案应该适用于您可以投入的几乎任何有效的 (X)HTML.(但请不要使用时髦的 shorttagsSGML 评论!)

Good day!

I would like some help in removing strings inside the square brackets and including the square brackets.

The string looks like this:

$string = "Lorem ipsum dolor<br /> [ Context are found on www.example.com ] <br />some text here. Text here. [test] Lorem ipsum dolor.";

I just would like to remove the brackets and its contents that contain "www.example.com". I would like to retain "[test]" in the string and any other brackets have no "www.example.com" in them.

Thanks!

解决方案

Note: The OP has dramatically changed the question. This solution was designed to handle the question in its original (more difficult) form (before the "www.example.com" constraint was added.) Although the following solution has been modified to handle this additional constraint, a simpler solution would now probably suffice (i.e. anubhava's answer).

Here is my tested solution:

function strip_bracketed_special($text) {
    $re = '% # Remove bracketed text having "www.example.com" within markup.
          # Skip comments, CDATA, SCRIPT & STYLE elements, and HTML tags.
          (                      # $1: HTML stuff to be left alone.
            <!--.*?-->           # HTML comments (non-SGML compliant).
          | <!\[CDATA\[.*?\]\]>  # CDATA sections
          | <script.*?</script>  # SCRIPT elements.
          | <style.*?</style>    # STYLE elements.
          | <\w+                 # HTML element start tags.
            (?:                  # Group optional attributes.
              \s+                # Attributes separated by whitespace.
              [\w:.-]+           # Attribute name is required
              (?:                # Group for optional attribute value.
                \s*=\s*          # Name and value separated by "="
                (?:              # Group for value alternatives.
                  "[^"]*"        # Either double quoted string,
                | \'[^\']*\'     # or single quoted string,
                | [\w:.-]+       # or un-quoted string (limited chars).
                )                # End group of value alternatives.
              )?                 # Attribute values are optional.
            )*                   # Zero or more start tag attributes.
            \s*/?>               # End of start tag (optional self-close).
          | </\w+>               # HTML element end tags.
          )                      # End #1: HTML Stuff to be left alone.
        | # Or... Bracketed structures containing www.example.com
          \s*\[                  # (optional ws), Opening bracket.
          [^\]]*?                # Match up to required content.
          www\.example\.com      # Required bracketed content.
          [^\]]*                 # Match up to closing bracket.
          \]\s*                  # Closing bracket, (optional ws).
        %six';
    return preg_replace($re, '$1', $text);
}

Note that the regex skips removal of bracketed material from within: HTML comments, CDATA sections, SCRIPT and STYLE elements and from within HTML tag attribute values. Given the following XHTML markup (which tests these scenarios), the above function correctly removes only the bracketed contents within html element contents:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Test special removal. [Remove this www.example.com]</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css">
        .test.before {
            content: "[Do not remove www.example.com]";
        }
    </style>
    <script type="text/javascript">
        // <![CDATA[ ["Do not remove www.example.com"] ]]>
        var ob = {};
        ob["Do not remove www.example.com"] = "stuff";
        var str = "[Do not remove www.example.com]";
    </script>
</head>
<body>
<!-- <![CDATA[ ["Do not remove www.example.com"] ]]> -->
<div title="[Do not remove www.example.com]">
<h1>Test special removal. [Remove this www.example.com]</h1>
<p>Test special removal. [Remove this www.example.com]</p>
<p onclick='var str = "[Do not remove www.example.com]"; return false;'>
    Test special removal. [Do not remove this]
    Test special removal. [Remove this www.example.com]
</p>
</div>
</body>
</html>

Here is the same markup after being run through the PHP function above:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Test special removal.</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css">
        .test.before {
            content: "[Do not remove www.example.com]";
        }
    </style>
    <script type="text/javascript">
        // <![CDATA[ ["Do not remove www.example.com"] ]]>
        var ob = {};
        ob["Do not remove www.example.com"] = "stuff";
        var str = "[Do not remove www.example.com]";
    </script>
</head>
<body>
<!-- <![CDATA[ ["Do not remove www.example.com"] ]]> -->
<div title="[Do not remove www.example.com]">
<h1>Test special removal.</h1>
<p>Test special removal.</p>
<p onclick='var str = "[Do not remove www.example.com]"; return false;'>
    Test special removal. [Do not remove this]
    Test special removal.</p>
</div>
</body>
</html>

This solution should work quite well for just about any valid (X)HTML you can throw at it. (But please, no funky shorttags or SGML comments!)

这篇关于删除括号内的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
PHP最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆