提取Div标签C#正则表达式内容 [英] Extract Content from Div Tag C# RegEx

查看:1555
本文介绍了提取Div标签C#正则表达式内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要提取divtestimonial1 DIV这里面的内容
我使用正则表达式如下,但其唯一返回第一行

 正则表达式R =新的正则表达式(&放大器; LT;格([^<] *<?!(/ DIV>)));


  < D​​IV CLASS =testimonial_contentID =divtestimonial1>
          < A NAME =T1>< / A>
          < D​​IV CLASS =testimonial_headline>%testimonial1headline< / DIV>
          < p align =left>< IMG SRC =ALT =WIDTH =193HEIGHT =204align =leftHSPACE =10ID =img_T1/><跨度类=testimonial_text>%testimonial1text< / SPAN>< BR />
          < / P>
  < / DIV>


解决方案

<一个href=\"http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454\">Regular前pressions一般不解析HTML 一个不错的选择。你可能会更好用的工具,如 HTML敏捷性包的,所以我建议您使用

话虽这么说,你可以使用这个正则表达式匹配特定的样本输入:

 &LT;?格* ID =divtestimonial1*方式&gt;?* LT; / DIV&GT;

但它也可能在现实世界中的场景打破。一个与正则表达式和HTML的烦恼是正确检测的标签嵌套等。

I need to extract this content inside the divtestimonial1 div I am using the following regEx, but its only returning the first line

Regex r = new Regex("&lt;div([^<]*<(?!/div>))");

  <div class="testimonial_content" id="divtestimonial1">
          <a name="T1"></a>
          <div class="testimonial_headline">%testimonial1headline</div>
          <p align="left"><img src="" alt="" width="193" height="204" align="left" hspace="10" id="img_T1"/><span class="testimonial_text">%testimonial1text</span><br />
          </p>
  </div>

解决方案

Regular expressions are generally not a good choice for parsing HTML. You might be better off using a tool such as HTML Agility Pack, so I would suggest you use that.

That being said, you can match your particular sample input using this Regex:

<div.*?id="divtestimonial1".*?>.*</div>

But it might break in your real-world scenario. One of the troubles with Regex and HTML is properly detecting nesting of tags, etc.

这篇关于提取Div标签C#正则表达式内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆