解析HTML字符串 [英] Parsing HTML String

查看:200
本文介绍了解析HTML字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法来解析净code HTML字符串后面如DOM解析...

即。 GetElementByTagName(ABC)。GetElementByTagName(标签)

我有这个code块...

 私人无效LoadProfilePage()
{
    串SURL;
    SURL =htt​​p://www.abcd1234.com/abcd1234;

    WebRequest的wrGETURL;
    wrGETURL = WebRequest.Create(SURL);

    // WebProxy MYPROXY =新WebProxy(MYPROXY,80);
    //myProxy.BypassProxyOnLocal = TRUE;

    //wrGETURL.Proxy = WebProxy.GetDefaultProxy();

    流objStream;
    objStream = wrGETURL.GetResponse()GetResponseStream()。

    如果(objStream!= NULL)
    {
        StreamReader的objReader =新的StreamReader(objStream);

        字符串SLINE = objReader.ReadToEnd();

        如果(String.IsNullOrEmpty(SLINE)==假)
        {
            ....
        }
    }
}
 

解决方案

您可以使用优秀的 HTML敏捷性包

  

这是一个灵活的HTML解析器,构建了一个读/写DOM和支持纯XPath或XSLT(你居然没有理解XPATH也不XSLT使用它,不用担心...)。这是一个.NET code库,使您解析出网的HTML文件。解析器很强的包容性与现实世界恶意的HTML。对象模型是非常相似,提出的System.Xml,但为HTML文档(或流)。

Is there a way to parse HTML string in .Net code behind like DOM parsing...

i.e. GetElementByTagName("abc").GetElementByTagName("tag")

I've this code chunk...

private void LoadProfilePage()
{        
    string sURL;
    sURL = "http://www.abcd1234.com/abcd1234";

    WebRequest wrGETURL;
    wrGETURL = WebRequest.Create(sURL);

    //WebProxy myProxy = new WebProxy("myproxy",80);
    //myProxy.BypassProxyOnLocal = true;

    //wrGETURL.Proxy = WebProxy.GetDefaultProxy();

    Stream objStream;
    objStream = wrGETURL.GetResponse().GetResponseStream();

    if (objStream != null)
    {
        StreamReader objReader = new StreamReader(objStream);

        string sLine = objReader.ReadToEnd();

        if (String.IsNullOrEmpty(sLine) == false)
        {
            ....                   
        }
    }
}

解决方案

You can use the excellent HTML Agility Pack.

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

这篇关于解析HTML字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆