从< div class ="中提取内容"> < / DIV>标签C#RegEx [英] Extract Content from <div class=" "> </div> Tag C# RegEx

查看:137
本文介绍了从< div class ="中提取内容"> < / DIV>标签C#RegEx的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个代码

  string tag =div; 
string pattern = string.Format(@\< {0}。*?\>(?< tegData>。+?)\< \ / {0} \> ;,tag.Trim());
Regex regex = new Regex(pattern,RegexOptions.ExplicitCapture);
MatchCollection matches = regex.Matches(data);

`



我需要< div class =in>之间的内容....< / div> 标签

 < div class =in> ; 
< a href =/ a / show / 7184569class =mm>ВАЗ2121< / a> < span class =for>за< / span> < span class =price> 2 700 $< / span>< / span>< br />< br class =year> 1990г.< / span>< br />< DIV风格= 余量:3PX 0 3PX 0!> 1.6л,бензин,КППмеханика,спробегом,белый,литыедиски,тонировка,спойлер,ветровики,противотуманки,Движокпослекапитальногоремонта< / DIV>< DIV>
< span style =display:block; padding:4px 0 0 0;>< span class =region>Костанай< / span>< span class =adv-phones> ;,+7(777)4464451< / span>< / span>

< small class =灰色空气> 24просмотра< / small>


< / div>
< div class =selectItemtitle =Выбратьid =fv_sic_7184569>
< a href =#class =fav-buttonid =fav_7184569>& nbsp;< / a> < / DIV>
< / div>

我该怎么做?
我的代码无效。

解决方案

这是一个正则表达式,可能会提取简单的div标记:

  //< div [^>]>(。+?)< / div> 

string tag =div;
string pattern = string.Format(@< {0} [^>]>(?< tegData>。+?)< / {0}>,tag.Trim ));

但是,使用RegEx进行HTML解析几乎总是不合适的,并且保证不能正常工作。这仅仅是因为诸如HTML之类的标记语言不是常规语言。

这就是说,使用XML解析器解析文档或片段然后提取所需内容会更好。事实上,使用只向前解析器甚至可能会比尝试使用RegEx更快。



您应该看看 .NET中的XmlReader类


I have a code`

string tag = "div";
string pattern = string.Format(@"\<{0}.*?\>(?<tegData>.+?)\<\/{0}\>", tag.Trim());
Regex regex = new Regex(pattern, RegexOptions.ExplicitCapture);
MatchCollection matches = regex.Matches(data);

`

and i need to get content between <div class="in"> .... </div> tags

   <div class="in">
        <a href="/a/show/7184569" class="mm">ВАЗ 2121</a> <span class="for">за</span>    <span class="price">2 700 $</span></span><br/><span class="year">1990 г.</span><br/><div style="margin: 3px 0 3px 0">1.6 л, бензин, КПП механика, с пробегом, белый, литые диски, тонировка, спойлер, ветровики, противотуманки, Движок после капитального ремонта!</div><div>
     <span style="display:block; padding: 4px 0 0 0;"><span class="region">Костанай</span><span class="adv-phones">, +7 (777) 4464451</span></span>

            <small class="gray air">24 просмотра</small>


            <small class="gray air">13 июня</small>
    </div>
    <div class="selectItem" title="Выбрать" id="fv_sic_7184569">
        <a href="#" class="fav-button" id="fav_7184569">&nbsp;</a>           </div>
</div>

How can I do it? My code doesn't work.

解决方案

Here's a regex that might extract simple div tags:

// <div[^>]*>(.+?)</div>

string tag = "div";
string pattern = string.Format(@"<{0}[^>]*>(?<tegData>.+?)</{0}>", tag.Trim());

However, using RegEx for HTML parsing is almost always inappropriate and guaranteed to not work properly. That is simply because markup languages such as HTML are not regular languages.

That being said you would be much better off using an XML parser to parse the document or fragment and then extract what you need. In fact, using a forward-only parser would probably even be faster than trying to use RegEx.

You should look at the XmlReader class in .NET.

这篇关于从&lt; div class =&quot;中提取内容&QUOT;&GT; &LT; / DIV&GT;标签C#RegEx的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆