如何找到特定的使用正则表达式匹配,并把它们放到一个字符串数组? [英] How do I find specific matches using regex and put them in a string array?

查看:214
本文介绍了如何找到特定的使用正则表达式匹配,并把它们放到一个字符串数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有我试图从数据中提取的HTML文件。我使用的正则表达式是

I have an HTML file that I'm trying to extract data from. The regex I'm using is

"<tr.+?>.+?<td class=\"table_row_col2\"><b>(.+?)&.+?</b>.+?<td class=\"table_row_col5\">(.+?)</td>.+?<td class=\"table_row_col6\">(.+?)</td>.+?</tr>"



它的工作原理在Python,但不是在C#。下面是一些示例数据:

It works in Python but not in C#. Here's some sample data:

<tr class="table_row" style="background-color: #d3d3d3;">
    <td class="table_row_col1">271</td>
    <td class="table_row_col2"><b>16/09/2015&nbsp;05:28&nbsp;PM</b></font></small></sup></td>
    <td class="table_row_col3"><span style="color:#e30613">14.3</span></td>
    <td class="table_row_col4">-</td>
    <td class="table_row_col5">8</td>
    <td class="table_row_col6">-</td>
    <td class="table_row_col7">-</td>
    <td class="table_row_col8">Before dinner</td>
    <td class="table_row_col9">-</td>
    <td class="table_row_col10">-</td>
    <td class="table_row_col11">-</td>
</tr>

<tr class="table_row" style="background-color: #ffffff;">
    <td class="table_row_col1">272</td>
    <td class="table_row_col2"><b>16/09/2015&nbsp;02:54&nbsp;PM</b></font></small></sup></td>
    <td class="table_row_col3"><span style="color:#e30613">17.6</span></td>
    <td class="table_row_col4">-</td>
    <td class="table_row_col5">20</td>
    <td class="table_row_col6">32</td>
    <td class="table_row_col7">-</td>
    <td class="table_row_col8">Other</td>
    <td class="table_row_col9">-</td>
    <td class="table_row_col10">-</td>
    <td class="table_row_col11">-</td>
</tr>

<tr class="table_row" style="background-color: #d3d3d3;">
    <td class="table_row_col1">273</td>
    <td class="table_row_col2"><b>15/09/2015&nbsp;11:09&nbsp;PM</b></font></small></sup></td>
    <td class="table_row_col3">-</td>
    <td class="table_row_col4">-</td>
    <td class="table_row_col5">-</td>
    <td class="table_row_col6">34</td>
    <td class="table_row_col7">-</td>
    <td class="table_row_col8">Before Bed</td>
    <td class="table_row_col9">-</td>
    <td class="table_row_col10">-</td>
    <td class="table_row_col11">-</td>
</tr>



我试图提取table_row_col2日期和table_row_col5和table_row_col6

I'm trying to extract the date from table_row_col2 and the numbers from table_row_col5 and table_row_col6

推荐答案

如果您知道HTML不会改变,你可以做这样的添加类划分:

If you know the HTML never changes you can do it like this adding a class Split:

List<string> rows = Split.Extract(htmlString, "class=\"table_row\"", "</tr>");
foreach (string row in rows)
{
    string col2 = Split.Extract(row, "class=\"table_row_col2\"><b>", "</b>")[0];
    string col5 = Split.Extract(row, "class=\"table_row_col5\">", "</td>")[0];
    string col6 = Split.Extract(row, "class=\"table_row_col6\">", "</td>")[0];

    Console.WriteLine(col2 + ", " + col5 + ", " + col6);
}

其他类拆分

public class Split
{
    public static List<string> Extract(string source, string splitStart, string splitEnd)
    {
        try
        {
            var results = new List<string>();

            string[] start = new string[] { splitStart };
            string[] end = new string[] { splitEnd };
            string[] temp = source.Split(start, StringSplitOptions.None);

            for (int i = 1; i < temp.Length; i++)
            {
                results.Add(temp[i].Split(end, StringSplitOptions.None)[0]);
            }

            return results;
        }
        catch (Exception e)
        {
            throw new Exception(e.Message);
        }
    }
}

这篇关于如何找到特定的使用正则表达式匹配,并把它们放到一个字符串数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆