如何找到特定的使用正则表达式匹配,并把它们放到一个字符串数组? [英] How do I find specific matches using regex and put them in a string array?
本文介绍了如何找到特定的使用正则表达式匹配,并把它们放到一个字符串数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有我试图从数据中提取的HTML文件。我使用的正则表达式是
I have an HTML file that I'm trying to extract data from. The regex I'm using is
"<tr.+?>.+?<td class=\"table_row_col2\"><b>(.+?)&.+?</b>.+?<td class=\"table_row_col5\">(.+?)</td>.+?<td class=\"table_row_col6\">(.+?)</td>.+?</tr>"
它的工作原理在Python,但不是在C#。下面是一些示例数据:
It works in Python but not in C#. Here's some sample data:
<tr class="table_row" style="background-color: #d3d3d3;">
<td class="table_row_col1">271</td>
<td class="table_row_col2"><b>16/09/2015 05:28 PM</b></font></small></sup></td>
<td class="table_row_col3"><span style="color:#e30613">14.3</span></td>
<td class="table_row_col4">-</td>
<td class="table_row_col5">8</td>
<td class="table_row_col6">-</td>
<td class="table_row_col7">-</td>
<td class="table_row_col8">Before dinner</td>
<td class="table_row_col9">-</td>
<td class="table_row_col10">-</td>
<td class="table_row_col11">-</td>
</tr>
<tr class="table_row" style="background-color: #ffffff;">
<td class="table_row_col1">272</td>
<td class="table_row_col2"><b>16/09/2015 02:54 PM</b></font></small></sup></td>
<td class="table_row_col3"><span style="color:#e30613">17.6</span></td>
<td class="table_row_col4">-</td>
<td class="table_row_col5">20</td>
<td class="table_row_col6">32</td>
<td class="table_row_col7">-</td>
<td class="table_row_col8">Other</td>
<td class="table_row_col9">-</td>
<td class="table_row_col10">-</td>
<td class="table_row_col11">-</td>
</tr>
<tr class="table_row" style="background-color: #d3d3d3;">
<td class="table_row_col1">273</td>
<td class="table_row_col2"><b>15/09/2015 11:09 PM</b></font></small></sup></td>
<td class="table_row_col3">-</td>
<td class="table_row_col4">-</td>
<td class="table_row_col5">-</td>
<td class="table_row_col6">34</td>
<td class="table_row_col7">-</td>
<td class="table_row_col8">Before Bed</td>
<td class="table_row_col9">-</td>
<td class="table_row_col10">-</td>
<td class="table_row_col11">-</td>
</tr>
我试图提取table_row_col2日期和table_row_col5和table_row_col6
I'm trying to extract the date from table_row_col2 and the numbers from table_row_col5 and table_row_col6
推荐答案
如果您知道HTML不会改变,你可以做这样的添加类划分:
If you know the HTML never changes you can do it like this adding a class Split:
List<string> rows = Split.Extract(htmlString, "class=\"table_row\"", "</tr>");
foreach (string row in rows)
{
string col2 = Split.Extract(row, "class=\"table_row_col2\"><b>", "</b>")[0];
string col5 = Split.Extract(row, "class=\"table_row_col5\">", "</td>")[0];
string col6 = Split.Extract(row, "class=\"table_row_col6\">", "</td>")[0];
Console.WriteLine(col2 + ", " + col5 + ", " + col6);
}
其他类拆分
public class Split
{
public static List<string> Extract(string source, string splitStart, string splitEnd)
{
try
{
var results = new List<string>();
string[] start = new string[] { splitStart };
string[] end = new string[] { splitEnd };
string[] temp = source.Split(start, StringSplitOptions.None);
for (int i = 1; i < temp.Length; i++)
{
results.Add(temp[i].Split(end, StringSplitOptions.None)[0]);
}
return results;
}
catch (Exception e)
{
throw new Exception(e.Message);
}
}
}
这篇关于如何找到特定的使用正则表达式匹配,并把它们放到一个字符串数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文