使用C#的HTML正则表达式 [英] Regex for HTML using C#

查看:181
本文介绍了使用C#的HTML正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用C#在html标签之间获取文本BlaBla,但我在两个\s正则表达式上出现错误。

文本可以有任何角色。



正则表达式:

匹配m = Regex.Match(文件, < h1 class = \header \>< span class = \itemprop \itemprop = \name \> (?+)\s * \s * LT; /跨度>中); 







示例html:

 <   h1    < span class =code-attribute> class   =  header >  <   span     class   =  itemprop     itemprop   = < span class =code-keyword> name > 文字I_need <  < span class =code-leadattribute> / span  >  
< span class = nobr > < a href = / year / 2013 /?ref_ = tt_ov_inf > 2013 < / a > < / span >





谢谢!

解决方案

它归结为字符串处理:\ s是正则表达式中需要的字符串,但它被解释为由C#编译器控制代码。替换为\\\\,它应该没问题:

匹配m = Regex.Match(文件, < h1 class = \header \>< span class = \itemprop \itemprop = \name \ > \\s * \\s * LT(+。); /跨度> 中); 

(你可以在字符串前加上'和''''字符,但是你必须加倍所有引号,并删除它们之前的反斜杠......)


I''m trying to get the text "BlaBla" between the html tags with C#, but I get an error on the two \s regex''s.
The text can have any character.

Regex:

Match m = Regex.Match(file, "<h1 class=\"header\"> <span class=\"itemprop\" itemprop=\"name\">\s*(.+?)\s*</span>");




Example html:

<h1 class="header"> <span class="itemprop" itemprop="name">Text I_need</span> 
            <span class="nobr">(<a href="/year/2013/?ref_=tt_ov_inf" >2013</a>)</span>



Thanks!

解决方案

It''s down to string processing: "\s" is the string you need in the regex, but it is being interpreted as a control code by the C# compiler. Replace it with "\\s" and it should be fine:

Match m = Regex.Match(file, "<h1 class=\"header\"> <span class=\"itemprop\" itemprop=\"name\">\\s*(.+?)\\s*</span>");

(You could prefix the string with and ''@'' character, but then you would have to double up all the quotes, and remove the backslashes before them...)


这篇关于使用C#的HTML正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆