该怎么做正则表达式模式在字符串中的任何地方都不匹配? [英] What to do Regular expression pattern doesn't match anywhere in string?

查看:321
本文介绍了该怎么做正则表达式模式在字符串中的任何地方都不匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下模式匹配<input>类型隐藏"字段:

I am trying to match <input> type "hidden" fields using this pattern:

/<input type="hidden" name="([^"]*?)" value="([^"]*?)" />/

这是示例表单数据:

<input type="hidden" name="SaveRequired" value="False" /><input type="hidden" name="__VIEWSTATE1" value="1H4sIAAtzrkX7QfL5VEGj6nGi+nP" /><input type="hidden" name="__VIEWSTATE2" value="0351118MK" /><input type="hidden" name="__VIEWSTATE3" value="ZVVV91yjY" /><input type="hidden" name="__VIEWSTATE0" value="3" /><input type="hidden" name="__VIEWSTATE" value="" /><input type="hidden" name="__VIEWSTATE" value="" />

但是我不确定typenamevalue属性是否总是以相同的顺序出现.如果type属性排在最后,则匹配将失败,因为在我的模式中,匹配是从头开始的.

But I am not sure that the type, name, and value attributes will always appear in the same order. If the type attribute comes last, the match will fail because in my pattern it’s at the start.

问题:
如何更改样式,使其与<input>标记中的属性位置无关,都将匹配?

Question:
How can I change my pattern so it will match regardless of the positions of the attributes in the <input> tag?

PS:顺便说一句,我正在使用基于 Adob​​e Air

P.S.: By the way I am using the Adobe Air based RegEx Desktop Tool for testing regular expressions.

推荐答案

与此处的所有答案相反,因为您要执行的正则表达式是一个非常有效的解决方案.这是因为您没有尝试匹配平衡的标记-使用正则表达式是不可能的!但是,您只匹配一个标签中的内容,这是完全正常的.

Contrary to all the answers here, for what you're trying to do regex is a perfectly valid solution. This is because you are NOT trying to match balanced tags-- THAT would be impossible with regex! But you are only matching what's in one tag, and that's perfectly regular.

这是问题所在.您不能仅使用一个正则表达式来完成它……您需要进行一次匹配以捕获<input>标记,然后对此进行进一步处理.请注意,这仅在属性值中都没有>字符的情况下才有效,因此虽然不完美,但足以满足理智的输入.

Here's the problem, though. You can't do it with just one regex... you need to do one match to capture an <input> tag, then do further processing on that. Note that this will only work if none of the attribute values have a > character in them, so it's not perfect, but it should suffice for sane inputs.

以下是一些Perl(伪)代码,向您展示我的意思:

Here's some Perl (pseudo)code to show you what I mean:

my $html = readLargeInputFile();

my @input_tags = $html =~ m/
    (
        <input                      # Starts with "<input"
        (?=[^>]*?type="hidden")     # Use lookahead to make sure that type="hidden"
        [^>]+                       # Grab the rest of the tag...
        \/>                         # ...except for the />, which is grabbed here
    )/xgm;

# Now each member of @input_tags is something like <input type="hidden" name="SaveRequired" value="False" />

foreach my $input_tag (@input_tags)
{
  my $hash_ref = {};
  # Now extract each of the fields one at a time.

  ($hash_ref->{"name"}) = $input_tag =~ /name="([^"]*)"/;
  ($hash_ref->{"value"}) = $input_tag =~ /value="([^"]*)"/;

  # Put $hash_ref in a list or something, or otherwise process it
}

这里的基本原理是,不要对一个正则表达式做太多的事情.如您所见,正则表达式强制执行一定数量的顺序.因此,您需要做的是首先匹配要提取的内容的CONTEXT,然后对想要的数据进行子匹配.

The basic principle here is, don't try to do too much with one regular expression. As you noticed, regular expressions enforce a certain amount of order. So what you need to do instead is to first match the CONTEXT of what you're trying to extract, then do submatching on the data you want.

但是,我会同意,通常来说,使用HTML解析器可能更容易,更好,您确实应该考虑重新设计代码或重新检查目标. :-)但是我不得不把这个答案发布为一个反驳,即解析HTML的任何子集是不可能的:当您考虑整个规范时,HTML和XML都是不规则的,但是标记的规范是相当规范的,当然在PCRE的力量之内.

However, I will agree that in general, using an HTML parser is probably easier and better and you really should consider redesigning your code or re-examining your objectives. :-) But I had to post this answer as a counter to the knee-jerk reaction that parsing any subset of HTML is impossible: HTML and XML are both irregular when you consider the entire specification, but the specification of a tag is decently regular, certainly within the power of PCRE.

这篇关于该怎么做正则表达式模式在字符串中的任何地方都不匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆