正则表达式从HTML标记获取属性 [英] Regular expression to get an attribute from HTML tag

查看：130 发布时间：2018/11/29 18:46:23 java regex

本文介绍了正则表达式从HTML标记获取属性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一个正则表达式，可以从java中的以下HTML片段中获取src（不区分大小写）标记。

I am looking for a regular expression that can get me src (case insensitive) tag from following HTML snippets in java.

<html><img src="kk.gif" alt="text"/></html>
<html><img src='kk.gif' alt="text"/></html>
<html><img src = "kk.gif" alt="text"/></html>

推荐答案

一种可能性：

String imgRegex = "<img[^>]+src\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>";

是一种可能性（如果匹配不区分大小写）。这有点乱，故意忽略不使用引号的情况。代表它而不用担心字符串转义：

is a possibility (if matched case-insensitively). It's a bit of a mess, and deliberately ignores the case where quotes aren't used. To represent it without worrying about string escapes:

<img[^>]+src\s*=\s*['"]([^'"]+)['"][^>]*>

匹配：

< img

一个或多个不是> （即可能的其他属性）

 
   src  
 
 可选空格
 
   =  
 
 可选空白
 
 开始分隔符'或 
 
  图片来源（可能不包括单引号或双引号）
 
 结束分隔符
 
  虽然表达式可以在此处停止，但我添加了： 
 
  
 零个或多个字符不是> （更多可能的属性）
 
  > 关闭标签
 
 

<img
one or more characters that aren't > (i.e. possible other attributes)
src
optional whitespace
=
optional whitespace
starting delimiter of ' or "
image source (which may not include a single or double quote)
ending delimiter
although the expression can stop here, I then added:


zero or more characters that are not > (more possible attributes)
> to close the tag

 注意事项： 
 
 如果你想加入 src = 同样，向左移动开括号： - ）
 
 这不关心delim iter平衡或没有分隔符的属性值，它也可以阻止格式错误的属性（例如包含> 的属性或包含的图像源'或）。
 
 使用这样的正则表达式解析HTML并非易事，而且在大多数情况下最好的快速黑客。
 
 

If you want to include the src= as well, move the open bracket further left :-)
This does not care about delimiter balancing or attribute values without delimiters, and it can also choke on badly-formed attributes (such as attributes that include > or image sources that include ' or ").
Parsing HTML with regular expressions like this is non-trivial, and at best a quick hack that works in the majority of cases.


                        这篇关于正则表达式从HTML标记获取属性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

正则表达式从HTML标记获取属性 [英] Regular expression to get an attribute from HTML tag

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

正则表达式从HTML标记获取属性 [英] Regular expression to get an attribute from HTML tag

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭