使用C#的HTML正则表达式 [英] Regex for HTML using C#
本文介绍了使用C#的HTML正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用C#在html标签之间获取文本BlaBla,但我在两个\s正则表达式上出现错误。
文本可以有任何角色。
正则表达式:
匹配m = Regex.Match(文件, < h1 class = \header \>< span class = \itemprop \itemprop = \name \> (?+)\s * \s * LT; /跨度>中跨度>);
示例html:
< h1 < span class =code-attribute> class = header > < span class = itemprop itemprop = < span class =code-keyword> name > 文字I_need < < span class =code-leadattribute> / span >
< span class = nobr > (< a href = / year / 2013 /?ref_ = tt_ov_inf > 2013 < / a > )< / span >
谢谢!
解决方案
它归结为字符串处理:\ s是正则表达式中需要的字符串,但它被解释为由C#编译器控制代码。替换为\\\\,它应该没问题:
匹配m = Regex.Match(文件, < h1 class = \header \>< span class = \itemprop \itemprop = \name \ > \\s * \\s * LT(+。); /跨度> 中跨度>);
(你可以在字符串前加上'和''''字符,但是你必须加倍所有引号,并删除它们之前的反斜杠......)
I''m trying to get the text "BlaBla" between the html tags with C#, but I get an error on the two \s regex''s.
The text can have any character.
Regex:
Match m = Regex.Match(file, "<h1 class=\"header\"> <span class=\"itemprop\" itemprop=\"name\">\s*(.+?)\s*</span>");
Example html:
<h1 class="header"> <span class="itemprop" itemprop="name">Text I_need</span>
<span class="nobr">(<a href="/year/2013/?ref_=tt_ov_inf" >2013</a>)</span>
Thanks!
解决方案
It''s down to string processing: "\s" is the string you need in the regex, but it is being interpreted as a control code by the C# compiler. Replace it with "\\s" and it should be fine:
Match m = Regex.Match(file, "<h1 class=\"header\"> <span class=\"itemprop\" itemprop=\"name\">\\s*(.+?)\\s*</span>");(You could prefix the string with and ''@'' character, but then you would have to double up all the quotes, and remove the backslashes before them...)
这篇关于使用C#的HTML正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文