提取标签属性的正则表达式 [英] Regular expression for extracting tag attributes
问题描述
我试图提取锚标记的属性(< a>
)。到目前为止,我有这样的表达:
(?< name> \b\w + \b)\s * = \s *( (小于值GT; [^?] *) | '(小于?值GT; [^'] *)'|(小于?值GT; [^'<> ; \s] +)\s *)+
适用于像
< a href =test.htmlclass =xyz>
和(单引号)
< a href ='test.html'class =xyz>
但不适用于不带引号的字符串:
< a href = test.html class = xyz>
如何修改我的正则表达式使其能够使用不带引号的属性?或者有更好的方法可以做到这一点?
谢谢!
更新: 到目前为止,感谢所有好评和建议。有一件事我没有提到:我可悲的是必须修补/修改不是由我自己编写的代码。并且没有时间/金钱来自下而上地重写这些东西。 如果你有一个像这样的元素
< name attribute = value attribute =valueattribute ='value'>
这个正则表达式可以用来连续查找每个属性名称和值。
(\S +)= [']?((?:。(?!['']?\ s +(?: \ S +) = | [> 中']))+)。[']?
应用于:
< a href = test.html class = xyz>
< a href =test.htmlclass =xyz>
< a href ='test.html'class =xyz>
会产生:
'href'=> 'test.html'
'class'=> 'xyz'
I'm trying to extract the attributes of a anchor tag (<a>
). So far I have this expression:
(?<name>\b\w+\b)\s*=\s*("(?<value>[^"]*)"|'(?<value>[^']*)'|(?<value>[^"'<> \s]+)\s*)+
which works for strings like
<a href="test.html" class="xyz">
and (single quotes)
<a href='test.html' class="xyz">
but not for string without quotes:
<a href=test.html class=xyz>
How can I modify my regex making it work with attributes without quotes? Or is there a better way to do that?
Thanks!
Update: Thanks for all the good comments and advices so far. There is one thing I didn't mention: I sadly have to patch/modify code not written by myself. And there is no time/money to rewrite this stuff from bottom up.
If you have an element like
<name attribute=value attribute="value" attribute='value'>
this regex could be used to find successively each attribute name and value
(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?
Applied on:
<a href=test.html class=xyz>
<a href="test.html" class="xyz">
<a href='test.html' class="xyz">
it would yield:
'href' => 'test.html'
'class' => 'xyz'
这篇关于提取标签属性的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!