HTML 文档中文本替换的正则表达式更正 [英] Regex correction for text replacement in HTML document

查看:55
本文介绍了HTML 文档中文本替换的正则表达式更正的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下正则表达式:

/<(?:textarea|select)[\s\S]*?>[\s\S]*?(\{\{\{variable:(.+?)\}\}\})[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]+?(value=[\s\S]+?)(\{\{\{变量:(.+?)\}\}\})[\s\S]+?>|(\{\{\{变量:(.+?)\}\}\})/im

还有这个(缩短的)HTML 文档:

<html lang="zh-cn"><头><meta charset="utf-8"><title>测试</title><身体><section id="关于"><div class="container about-container"><div class="row"><div class="col-md-12">{{{block:welcome-intro}}}

</节><section id="服务"><div class="容器"><div class="row"><div class="col-md-12"><p>您正在使用系统版本:{{{variable:system_version}}}</p><p>您的地址:{{{variable:contact-email-address}}}</p><form action="http://k.loc/content/view/welcome" class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8"><input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78"/><div class="row"><div class="col-sm-12 form-error"></div>

<div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>个人信息</legend><div class="行"><div class="col-sm-12"><div class="control-label"><label for="testinput">Name<span class="form-validation-required">* </span></label>

<div class="hint-text">输入至少 2 个字符,最多 12 个字符.</div><input id="testinput" name="testinput" placeholder="在此处输入您的姓名."class="input-group width-50" type="text" value="{{{variable:system_name}}} {{{variable:system_login}}}"><div class="row"><;div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12"><div class="control-label"><label for="testpassword">密码</label>

<div class="hint-text">您的密码长度必须至少为 12 个字符,包含 1 个特殊字符、1 个数字、1 个小写字符和 1 个大写字符.</div><input id=""testpassword" name="testpassword" placeholder="在此处输入您的密码."class="input-group width-50" type="password"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><fieldset id="bioinfo"><legend>传记信息</legend><div class="row"><divclass="col-sm-12"><div class="control-label"><label for="testtextarea">传记</label><span class="hint-text">最少 40 个字符,最多 255 个字符.该提示内嵌显示.</span>

<textarea id="testtextarea" name="testtextarea" placeholder="请在此处输入您的传记."class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}}{{{variable:system_login}}}</textarea><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12"><div class="control-label"><label for="testsummernote">兴趣</label><span class="hint-text">至少需要 40 个字符.该提示内嵌显示.</span>

<textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="请在此处输入您的兴趣."><p>{{{variable:system_name}}}<br><;/p><p>{{{variable:system_login}}}</p><p>{{{variable:activate_url}}}<br></p></textarea><;/div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><按钮名称="testsubmit" id="testsubmit" type="submit" class="btn primary">Submit<i class="zmdi zmdi-arrow-forward"></i></button><;/div>

</表单>

</节></html>

解析上面的 HTML 文档以找到 {{{variable:whatever}}} 产生这个结果:

数组([0] =>大批([0] =>{{{变量:system_version}}}[1] =>{{{variable:contact-email-address}}}[2] =><input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78"/><div class="row"><div class="col-sm-12 form-error"></div></div><div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>个人信息</legend><div class="行"><div class="col-sm-12"><div class="control-label"><label for="testinput">Name<span class="form-validation-required">* </span></label></div><div class="hint-text">输入至少 2 个字符,最多 12 个字符.</div><input id="testinput" name="testinput" placeholder="在此处输入您的姓名."class="input-group width-50" type="text" value="{{{variable:system_name}}} {{{variable:system_login}}}">[3] =><textarea id="testtextarea" name="testtextarea" placeholder="请在此处输入您的传记."class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}} {{{variable:system_login}}}</textarea>[4] =><textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="请在此处输入您的兴趣."><p>{{{variable:system_name}}}<br><;/p><p>{{{variable:system_login}}}</p><p>{{{variable:activate_url}}}<br></p></textarea>))

我正在学习正则表达式,但仍然不理解所有概念,但我正在变得更好,所以如果我的术语有误,请原谅,但它似乎确实进行了某种贪婪的匹配.我希望在索引 [2] 处只看到 <input id="testinput"...{{{variable:...}}}">.

最终目标是只用不同的数据替换这些不在 textarea/select/input 中的占位符.

为什么索引 [2] 会匹配这么多元素,如何解决?

解决方案

虽然令人不悦,但我猜这个表达方式可能更接近您的想法,但不太确定:

<(?:textarea|select).*?>.*?(\{\{\{variable:(.*?)\}\}\}).*?<\/(?:textarea|select)>|<(?:input).+?(value=.*?)(\{\{\{variable:(.+?)\}\}\})?.*?>|(\{\{\{变量:(.*?)\}\}\})

它还可以改进,例如不需要转义:

<(?:textarea|select).*?>.*?({{{variable:(.*?)}}}).*?</(?:textarea|select)>|<(?:input).+?(value=.*?)({{{variable:(.+?)}}})?.*?>|({{{variable:(.*?)}}})

在这里,我们将尝试为我们的 input 元素添加一个可选组,以便区分那些有和没有现有变量的元素.

演示

测试

$re = '/<(?:textarea|select).*?>.*?(\{\{\{variable:(.*?)\}\}\}).*?<\/(?:textarea|select)>|<(?:input).+?(value=.*?)(\{\{\{variable:(.+?)\}\}\})?.*?>|(\{\{\{变量:(.*?)\}\}\})/si';$str = '
<div class="container"><div class="row"><div class="col-md-12"><p>您正在使用系统版本:{{{variable:system_version}}}</p><p>您的地址:{{{variable:contact-email-address}}}</p><form action="http://k.loc/content/view/welcome";类=默认形式";enctype="multipart/form-data";方法=发布"接受字符集=utf-8"><输入类型=隐藏"名称=csrfkcmstoken"值=94ee71ada809b9a79d1b723c81020c78"/><div class="row"><div class="col-sm-12 form-error"></div>

<div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>个人信息</legend><divclass=row">

<div class="control-label">';preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);var_dump($matches);

I have this following regex:

/<(?:textarea|select)[\s\S]*?>[\s\S]*?(\{\{\{variable:(.+?)\}\}\})[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]+?(value=[\s\S]+?)(\{\{\{variable:(.+?)\}\}\})[\s\S]+?>|(\{\{\{variable:(.+?)\}\}\})/im

And this (shortened) HTML document:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>Test</title>
</head>
<body>
    <section id="about">
        <div class="container about-container">
            <div class="row">
                <div class="col-md-12">
                    {{{block:welcome-intro}}}
                </div>
            </div>
        </div>
    </section>
    <section id="services">
        <div class="container">
            <div class="row">
                <div class="col-md-12">
                                        <p>You are using system version: {{{variable:system_version}}}</p>
                    <p>Your address: {{{variable:contact-email-address}}}</p>
                    <form action="http://k.loc/content/view/welcome"  class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8">
                                                                                    <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />

                        <div class="row">
                            <div class="col-sm-12 form-error"></div>
                        </div>
                    <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testinput">Name<span class="form-validation-required"> * </span></label>

                    </div>
                <div class="hint-text">Enter at least 2 characters and a maximum of 12 characters.</div><input id="testinput" name="testinput" placeholder="Enter your name here." class="input-group width-50" type="text" value="{{{variable:system_name}}}  {{{variable:system_login}}}"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testpassword">Password</label>

                    </div>
                <div class="hint-text">Your password must be at least 12 characters long, contain 1 special character, 1 nunber, 1 lower case character and 1 upper case character.</div><input id="testpassword" name="testpassword" placeholder="Enter your password here." class="input-group width-50" type="password"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><fieldset id="bioinfo"><legend>Biographical information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testtextarea">Biography</label>
                <span class="hint-text">A minimum of 40 characters and a maximum of 255 is allowed. This hint is displayed inline.</span>
                    </div>
                <textarea id="testtextarea" name="testtextarea" placeholder="Please enter your biography here." class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}}

{{{variable:system_login}}}</textarea><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testsummernote">Interests</label>
                <span class="hint-text">A minimum of 40 characters is required. This hint is displayed inline.</span>
                    </div>
                <textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="Please enter your interests here."><p>{{{variable:system_name}}}<br></p><p>{{{variable:system_login}}}</p><p>{{{variable:activate_url}}}<br></p></textarea></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><button name="testsubmit" id="testsubmit" type="submit" class="btn primary">Submit<i class="zmdi zmdi-arrow-forward"></i></button></div></div>
        </form>                </div>
            </div>
        </div>
    </section>
</body>
</html>

Parsing above HTML document to find {{{variable:whatever}}} yields this result:

Array
(
    [0] => Array
        (
            [0] => {{{variable:system_version}}}
            [1] => {{{variable:contact-email-address}}}
            [2] => <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />
                   <div class="row"><div class="col-sm-12 form-error"></div></div>
                   <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                   <div class="control-label"><label for="testinput">Name<span class="form-validation-required"> * </span></label></div>
                   <div class="hint-text">Enter at least 2 characters and a maximum of 12 characters.</div>
                   <input id="testinput" name="testinput" placeholder="Enter your name here." class="input-group width-50" type="text" value="{{{variable:system_name}}}  {{{variable:system_login}}}">
            [3] => <textarea id="testtextarea" name="testtextarea" placeholder="Please enter your biography here." class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}} {{{variable:system_login}}}</textarea>
            [4] => <textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="Please enter your interests here."><p>{{{variable:system_name}}}<br></p><p>{{{variable:system_login}}}</p><p>{{{variable:activate_url}}}<br></p></textarea>
        )
)

I am learning regexes and still do not understand all the concepts, but I am getting better, so please excuse if my terminology is wrong, but it does appear that it does a greedy match of some sort. I am expecting to only see <input id="testinput"...{{{variable:...}}}"> at index [2].

The end goal is to only replace these placeholders with different data if they are not inside a textarea/select/input.

Why would index [2] match so many elements, and how can this be fixed?

解决方案

It's frowned upon, yet I'm guessing that maybe this expression might be slightly closer to what you may have in mind, not so sure though:

<(?:textarea|select).*?>.*?(\{\{\{variable:(.*?)\}\}\}).*?<\/(?:textarea|select)>|<(?:input).+?(value=.*?)(\{\{\{variable:(.+?)\}\}\})?.*?>|(\{\{\{variable:(.*?)\}\}\})

It can be also improved, for instance escapings are unnecessary:

<(?:textarea|select).*?>.*?({{{variable:(.*?)}}}).*?</(?:textarea|select)>|<(?:input).+?(value=.*?)({{{variable:(.+?)}}})?.*?>|({{{variable:(.*?)}}}) 

Here, we'd be trying to add an optional group for our input elements, so that it would distinguish between those with and without the existing vars.

Demo

Test

$re = '/<(?:textarea|select).*?>.*?(\{\{\{variable:(.*?)\}\}\}).*?<\/(?:textarea|select)>|<(?:input).+?(value=.*?)(\{\{\{variable:(.+?)\}\}\})?.*?>|(\{\{\{variable:(.*?)\}\}\})/si';
$str = '<section id="services">
        <div class="container">
            <div class="row">
                <div class="col-md-12">
                                        <p>You are using system version: {{{variable:system_version}}}</p>
                    <p>Your address: {{{variable:contact-email-address}}}</p>
                    <form action="http://k.loc/content/view/welcome"  class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8">
                                                                                    <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />

                        <div class="row">
                            <div class="col-sm-12 form-error"></div>
                        </div>
                    <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

var_dump($matches);

这篇关于HTML 文档中文本替换的正则表达式更正的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
PHP最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆