HTML正则表达式< tr>标签 [英] Regex for HTML <tr> tag

查看:132
本文介绍了HTML正则表达式< tr>标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有< tr> 类的HTML页面,我需要捕获这些标签之间的文本。



我试过正则表达式

 (?i)< T R [^>]?*>([^<] *)< / TR> 

但它不起作用。



这是我在C#中的所有代码:

  string patternPost = @(?i)< tr [^>] ?*>([^<] *)< / TR>中; 
MatchCollection m1 = Regex.Matches(html,patternPost,RegexOptions.Multiline);
foreach(在m1中匹配m)
{
MessageBox.Show(m.Groups [1] .Value);
}

在这里你可以找到一个HTML页面的例子: http://pastebin.com/ewN5NZis



你可以看到2块,我需要存储每个块,三个信息在三个不同的列表中:

 列表1:标题1,标题2 
列表2:约翰,安东尼
清单3:29/04/14,28/04/14

使用我的第一个正则表达式,我想先尝试捕获所有块并跳过像 tr 标签不同的无用信息,接下来我想尝试使用3种不同的正则表达式为每个块捕获3个信息。
这是正确的吗?我希望现在你明白我的意思。

编辑:在你最后的评论中,你说过:< tr ....> <标记> ...< / tag> < TAG2> ...< / TAG2> < / tr> 这是对原始问题的相当大的扩展。在这个阶段,我同意所有其他建议:您将需要一个dom解析器。



旧编辑:最初您要求匹配< tr> 标签。对于简单的< tr> 标签:摘录第1组来自

 (?i)< tr>([^<] *)< / tr> 

或者< tr with stuff>

 (?i)< tr> *>([^<] *)< ; / TR> 

或者< tr stuff>< td stuff> Grab Me< / td>

 (?i)< tr [^>]>> \s *< td [^>] *?>(。*)< / td 

以下是一个代码示例:

  using System; 
使用System.Text.RegularExpressions;
class Program {
static void Main(){

string s1 =< tr stuff>< td stuff>抓住我< / td>;
var r = new Regex((?i)< tr> *> \\\ s *< td [^>] *?>(。*)< ; / TD);
string capture = r.Match(s1).Groups [1] .Value;
Console.WriteLine(capture);
Console.WriteLine(\ n按任意键退出);
Console.ReadKey();
} // END主要
} //结束程序

输出:抓住我


I have an HTML page with <tr> classes and I need to capture the text inbetween those tags.

I tried with Regex:

(?i)<tr[^>]*?>([^<]*)</tr> 

But it doesn't work.

This is all my code in C#:

string patternPost = @"(?i)<tr[^>]*?>([^<]*)</tr>";
MatchCollection m1 = Regex.Matches(html, patternPost, RegexOptions.Multiline);
foreach (Match m in m1)
    {
        MessageBox.Show(m.Groups[1].Value);
    }

Here you can find an example of HTML page: http://pastebin.com/ewN5NZis

You can see 2 block, I need to store for each of blocks, three info in three different list:

List 1: Title1, Title2
List 2: John, Antony
List 3: 29/04/14, 28/04/14

With my first regex I wanna try first to catch all blocks and skip useless information like tags differents from tr and next I wanna try to catch 3 infos for each blocks with 3 different regex. Is this right? I hope now you understand me.

解决方案

EDIT: In your last comment, you said: <tr ....> <tag> ... </tag> <tag2>...</tag2> </tr> which is quite an expansion on the original problem. At this stage, I concur with all other advice: you are going to need a dom parser.

Older Edit: Originally you asked to match contents of <tr> tags. Specs have changed, so this answer contains the evolving versions.

For a plain <tr> tag: extract Group 1 from

(?i)<tr>([^<]*)</tr>

or for a <tr with stuff>:

(?i)<tr[^>]*>([^<]*)</tr>

or for <tr stuff><td stuff>Grab Me</td>

(?i)<tr[^>]*?>\s*<td[^>]*?>(.*)</td

Here is a code sample:

using System;
using System.Text.RegularExpressions;
class Program {
static void Main() {

string s1 = "<tr stuff><td stuff>Grab Me</td>";
var r = new Regex("(?i)<tr[^>]*?>\\s*<td[^>]*?>(.*)</td");
string capture = r.Match(s1).Groups[1].Value;
Console.WriteLine(capture);
Console.WriteLine("\nPress Any Key to Exit.");
Console.ReadKey();
} // END Main
} // END Program

Output: Grab Me

这篇关于HTML正则表达式&lt; tr&gt;标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆