怎么办 正则表达式模式在字符串中的任何地方都不匹配? [英] What to do Regular expression pattern doesn't match anywhere in string?

查看:17
本文介绍了怎么办 正则表达式模式在字符串中的任何地方都不匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用此模式匹配 类型的隐藏"字段:

I am trying to match <input> type "hidden" fields using this pattern:

/<input type="hidden" name="([^"]*?)" value="([^"]*?)" />/

这是示例表单数据:

<input type="hidden" name="SaveRequired" value="False" /><input type="hidden" name="__VIEWSTATE1" value="1H4sIAAtzrkX7QfL5VEGj6nGi+nP" /><input type="hidden" name="__VIEWSTATE2" value="0351118MK" /><input type="hidden" name="__VIEWSTATE3" value="ZVVV91yjY" /><input type="hidden" name="__VIEWSTATE0" value="3" /><input type="hidden" name="__VIEWSTATE" value="" /><input type="hidden" name="__VIEWSTATE" value="" />

但我不确定 typenamevalue 属性是否总是以相同的顺序出现.如果 type 属性最后出现,匹配将失败,因为在我的模式中它在开头.

But I am not sure that the type, name, and value attributes will always appear in the same order. If the type attribute comes last, the match will fail because in my pattern it’s at the start.

问题:
如何更改我的模式,以便无论 <input> 标签中属性的位置如何,它都会匹配?

Question:
How can I change my pattern so it will match regardless of the positions of the attributes in the <input> tag?

PS:顺便说一下,我使用的是基于 Adobe AirRegEx 桌面工具,用于测试正则表达式.

P.S.: By the way I am using the Adobe Air based RegEx Desktop Tool for testing regular expressions.

推荐答案

与这里的所有答案相反,正则表达式是一个完全有效的解决方案.这是因为您没有尝试匹配平衡标签——使用正则表达式是不可能的!但是您只匹配一个标签中的内容,这是完全正常的.

Contrary to all the answers here, for what you're trying to do regex is a perfectly valid solution. This is because you are NOT trying to match balanced tags-- THAT would be impossible with regex! But you are only matching what's in one tag, and that's perfectly regular.

问题来了.你不能只用一个正则表达式...你需要做一个匹配来捕获一个 标签,然后做进一步的处理.请注意,这仅在没有任何属性值包含 > 字符时才有效,因此它并不完美,但对于理智的输入应该足够了.

Here's the problem, though. You can't do it with just one regex... you need to do one match to capture an <input> tag, then do further processing on that. Note that this will only work if none of the attribute values have a > character in them, so it's not perfect, but it should suffice for sane inputs.

这里有一些 Perl(伪)代码来告诉你我的意思:

Here's some Perl (pseudo)code to show you what I mean:

my $html = readLargeInputFile();

my @input_tags = $html =~ m/
    (
        <input                      # Starts with "<input"
        (?=[^>]*?type="hidden")     # Use lookahead to make sure that type="hidden"
        [^>]+                       # Grab the rest of the tag...
        />                         # ...except for the />, which is grabbed here
    )/xgm;

# Now each member of @input_tags is something like <input type="hidden" name="SaveRequired" value="False" />

foreach my $input_tag (@input_tags)
{
  my $hash_ref = {};
  # Now extract each of the fields one at a time.

  ($hash_ref->{"name"}) = $input_tag =~ /name="([^"]*)"/;
  ($hash_ref->{"value"}) = $input_tag =~ /value="([^"]*)"/;

  # Put $hash_ref in a list or something, or otherwise process it
}

这里的基本原则是,不要试图用一个正则表达式做太多事情.正如您所注意到的,正则表达式强制执行一定数量的顺序.因此,您需要做的是首先匹配您要提取的内容的 CONTEXT,然后对您想要的数据进行子匹配.

The basic principle here is, don't try to do too much with one regular expression. As you noticed, regular expressions enforce a certain amount of order. So what you need to do instead is to first match the CONTEXT of what you're trying to extract, then do submatching on the data you want.

但是,我同意一般来说,使用 HTML 解析器可能更容易、更好,您确实应该考虑重新设计代码或重新检查您的目标.:-) 但是我不得不发布这个答案作为对解析任何 HTML 子集是不可能的下意识反应的反击:当您考虑整个规范时,HTML 和 XML 都是不规则的,但是标签的规范是体面的规则,当然在 PCRE 的能力范围内.

However, I will agree that in general, using an HTML parser is probably easier and better and you really should consider redesigning your code or re-examining your objectives. :-) But I had to post this answer as a counter to the knee-jerk reaction that parsing any subset of HTML is impossible: HTML and XML are both irregular when you consider the entire specification, but the specification of a tag is decently regular, certainly within the power of PCRE.

这篇关于怎么办 正则表达式模式在字符串中的任何地方都不匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆