正则表达式从HTML标记获取属性 [英] Regular expression to get an attribute from HTML tag
本文介绍了正则表达式从HTML标记获取属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在寻找一个正则表达式,可以从java中的以下HTML片段中获取src(不区分大小写)标记。
I am looking for a regular expression that can get me src (case insensitive) tag from following HTML snippets in java.
<html><img src="kk.gif" alt="text"/></html>
<html><img src='kk.gif' alt="text"/></html>
<html><img src = "kk.gif" alt="text"/></html>
推荐答案
一种可能性:
String imgRegex = "<img[^>]+src\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>";
是一种可能性(如果匹配不区分大小写)。这有点乱,故意忽略不使用引号的情况。代表它而不用担心字符串转义:
is a possibility (if matched case-insensitively). It's a bit of a mess, and deliberately ignores the case where quotes aren't used. To represent it without worrying about string escapes:
<img[^>]+src\s*=\s*['"]([^'"]+)['"][^>]*>
匹配:
-
< img
- 一个或多个不是
> $的字符c $ c>(即可能的其他属性)
-
src
- 可选空格
-
=
- 可选空白
- 开始分隔符
'
或 - 图片来源(可能不包括单引号或双引号)
- 结束分隔符
- 虽然表达式可以在此处停止,但我添加了:
- 零个或多个字符不是
>
(更多可能的属性) -
>
关闭标签
<img
- one or more characters that aren't
>
(i.e. possible other attributes) src
- optional whitespace
=
- optional whitespace
- starting delimiter of
'
or"
- image source (which may not include a single or double quote)
- ending delimiter
- although the expression can stop here, I then added:
- zero or more characters that are not
>
(more possible attributes) >
to close the tag
注意事项:
- 如果你想加入
src =
同样,向左移动开括号: - ) - 这不关心delim iter平衡或没有分隔符的属性值,它也可以阻止格式错误的属性(例如包含
>
的属性或包含的图像源'
或)。
- 使用这样的正则表达式解析HTML并非易事,而且在大多数情况下最好的快速黑客。
- If you want to include the
src=
as well, move the open bracket further left :-) - This does not care about delimiter balancing or attribute values without delimiters, and it can also choke on badly-formed attributes (such as attributes that include
>
or image sources that include'
or"
). - Parsing HTML with regular expressions like this is non-trivial, and at best a quick hack that works in the majority of cases.
这篇关于正则表达式从HTML标记获取属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- zero or more characters that are not
- 零个或多个字符不是
查看全文