匹配表W /正则表达式 [英] Match Table w/ Regex

查看:167
本文介绍了匹配表W /正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想匹配的表W /正则表达式,但我有一些问题。我想不通究竟为什么它不会匹配正确。下面是HTML:

I'm trying to match a table w/ regex but I'm having some issues. I can't figure out exactly why it will not match properly. Here is the HTML:

    <table class="integrationteamstats">
    <tbody>
    <tr>
        <td class="right">
            <span class="mediumtextBlack">Queue:</span>
        </td>
        <td class="left">
            <span class="mediumtextBlack">0</span>
        </td>
        <td class="right">
            <span class="mediumtextBlack">Aban:</span>
        </td>
        <td class="left">
            <span class="mediumtextBlack">0%</span>
        </td>
        <td class="right">
            <span class="mediumtextBlack">Staffed:</span>
        </td>
        <td class="left">
            <span class="mediumtextBlack">0</span>
        </td>
    </tr>
    <tr>
        <td class="right">
            <span class="mediumtextBlack">Wait:</span>
        </td>
        <td class="left">
            <span class="mediumtextBlack">0:00</span>
        </td>
        <td class="right">
            <span class="mediumtextBlack">Total:</span>
        </td>
        <td class="left">
            <span class="mediumtextBlack">0</span>
        </td>
        <td class="right">
            <span class="mediumtextBlack">On ACD:</span>
        </td>
        <td class="left">
            <span class="mediumtextBlack">0</span>
        </td>
    </tr>
    </tbody>
    </table>

我需要2条信息: 下面队列的TD内的数据及以下等待TD内的数据(这样的队列数和等待时间)。 Obivously数字要频繁地更新。

I need to get 2 pieces of information: the data inside of the td below Queue and the data inside the td below Wait (so the Queue count and wait time). Obivously the numbers are going to update frequently.

这是正则表达式我对拉初始表,但它不是工作:

This is the regex I have for pulling the initial table, but it isnt working:

Match statstable = Regex.Match(this.html, "<table class=\"integrationteamstats\">(.*?)</table>");

和我不知道我的正则表达式应该使用从TD的获取数据。

And I'm not sure what regex I should use to get the data from the td's.

在任何人问,没有,没有办法,我可以更新HTML有一个ID或性质的事情。它的pretty的多是。这是一贯的唯一的事情是TD的位置。

Before anyone asks, no there is no way I can update the HTML to have an ID or anything of that nature. Its pretty much as is. The only thing that is consistent is the location of the td's.

推荐答案

相反正则表达式,我建议使用 HTML敏捷性包解析HTML和查询其结构。

Instead of regex, I suggest using the HTML Agility Pack to parse the HTML and query its structure.

什么是完全的Html敏捷包(HAP)?

What is exactly the Html Agility Pack (HAP)?

这是一个灵活的HTML解析器,构建了一个读/写DOM和支持纯XPath或XSLT(你居然没有理解XPATH也不XSLT使用它,不用担心...)。这是一个.NET code库,使您解析出网的HTML文件。解析器很强的包容性与现实世界恶意的HTML。对象模型是非常相似,提出的System.Xml,但为HTML文档(或流)。

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

在一般情况下,正则表达式是解析HTML 一个糟糕的选择。

In general, regex is a poor choice for parsing HTML.

这篇关于匹配表W /正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆