正则表达式从一个img标签的src获得价值 [英] Regex to get src value from an img tag
问题描述
我用下面的正则表达式得到一个HTML文档中的第一个 IMG
标签的的src
值。
I am using the following regex to get the src
value of the first img
tag in an HTML document.
string match = "src=(?:\"|\')?(?<imgSrc>[^>]*[^/].(?:jpg|png))(?:\"|\')?"
现在它捕捉总的src
属性,我不需要。我只需要在的src
属性中的URL。怎么办呢?
Now it captures total src
attribute that I dont need. I just need the url inside the src
attribute. How to do it?
推荐答案
解析用别的东西你的HTML。的 HTML是不是经常因而经常EX pressions根本不是适合于分析它。
Parse your HTML with something else. HTML is not regular and thus regular expressions aren't at all suited to parsing it.
如果HTML是严格使用HTML解析器,或者XML解析器。这是一个更容易获得src属性的值使用XPath:
Use an HTML parser, or an XML parser if the HTML is strict. It's a lot easier to get the src attribute's value using XPath:
//img/@src
XML解析内置于 的System.Xml
命名空间。这是令人难以置信的强大。 HTML解析是有点难度,如果HTML不严格,但也有很多图书馆的周围会为你做它。
XML parsing is built into the System.Xml
namespace. It's incredibly powerful. HTML parsing is a bit more difficult if the HTML isn't strict, but there are lots of libraries around that will do it for you.
这篇关于正则表达式从一个img标签的src获得价值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!