使用C#中的match提取两个字符串定界符之间的字符串内容 [英] Extract the contents of a string between two string delimiters using match in C#

查看:90
本文介绍了使用C#中的match提取两个字符串定界符之间的字符串内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,假设我正在解析以下HTML字符串:

So, say I'm parsing the following HTML string:

<html>
    <head>
        RANDOM JAVASCRIPT AND CSS AHHHHHH!!!!!!!!
    </head>
    <body>
        <table class="table">
            <tr><a href="/subdir/members/Name">Name</a></tr>
            <tr><a href="/subdir/members/Name">Name</a></tr>
            <tr><a href="/subdir/members/Name">Name</a></tr>
            <tr><a href="/subdir/members/Name">Name</a></tr>
            <tr><a href="/subdir/members/Name">Name</a></tr>
            <tr><a href="/subdir/members/Name">Name</a></tr>
            <tr><a href="/subdir/members/Name">Name</a></tr>
            <tr><a href="/subdir/members/Name">Name</a></tr>
            <tr><a href="/subdir/members/Name">Name</a></tr>
            <tr><a href="/subdir/members/Name">Name</a></tr>
        </table>
    <body>
</html>

,我想隔离 ** 的内容(表类中的所有内容)

and I want to isolate the contents of ** (everything inside of the table class)

现在,我使用正则表达式来完成此操作:

Now, I used regex to accomplish this:

string pagesource = (method that extracts the html source and stores it into a string);
string[] splitSource = Regex.Split(pagesource, "<table class=/"member/">;
string memberList = Regex.Split(splitSource[1], "</table>");
//the list of table members will be in memberList[0];
//method to extract links from the table
ExtractLinks(memberList[0]);

我一直在寻找进行此提取的其他方法,并且遇到了C#中的Match对象.

I've been looking at other ways to do this extraction, and I came across the Match object in C#.

我正在尝试做这样的事情:

I'm attempting to do something like this:

Match match = Regex.Match(pageSource, "<table class=\"members\">(.|\n)*?</table>");

上面的目的是希望提取两个定界符之间的匹配值,但是,当我尝试运行它时,匹配值是:

The purpose of the above was to hopefully extract a match value between the two delimiters, but, when I try to run it the match value is:

match.value = </table>

如此,我的问题是:有没有一种方法可以比我使用正则表达式的方法更容易/更易读/更短地从我的字符串中提取数据?对于这个简单的示例,正​​则表达式很好,但对于更复杂的示例,我发现自己在屏幕上的编码方式都像是涂鸦.

MY question, as such, is: is there a way to extract data from my string that is slightly easier/more readable/shorter than my method using regex? For this simple example, regex is fine, but for more complex examples, I find myself with the coding equivalent of scribbles all over my screen.

我真的很想使用match,因为它看起来很整洁,但是我似乎无法让它满足我的需要.有人可以帮我吗?

I would really like to use match, because it seems like a very neat and tidy class, but I can't seem to get it working for my needs. Can anyone help me with this?

非常感谢!

推荐答案

使用HTML解析器,例如 HTML Agility Pack .

Use an HTML parser, like HTML Agility Pack.

var doc = new HtmlDocument();

using (var wc = new WebClient())
using (var stream = wc.OpenRead(url))
{
    doc.Load(stream);
}

var table = doc.DocumentElement.Element("html").Element("body").Element("table");
string tableHtml = table.OuterHtml;

这篇关于使用C#中的match提取两个字符串定界符之间的字符串内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆