HtmlAgilityPack可以处理xsl文件随附的xml文件来呈现html吗? [英] Can HtmlAgilityPack handle an xml file that comes with an xsl file to render html?

查看:46
本文介绍了HtmlAgilityPack可以处理xsl文件随附的xml文件来呈现html吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道HtmlAgilityPack读取xml文件的最佳方法,该文件包括xsl文件以呈现html. HtmlDocument类上是否有任何设置可以帮助实现此目的,或者在用HtmlAgiliyPack加载转换之前我是否必须找到一种执行转换的方法?如果对后者是肯定的,那么有人知道用于这种转换的好的库或方法吗?下面是一个网站示例,该网站返回带有xls文件的xml和我要使用的代码.

I was wondering the best way for HtmlAgilityPack to read an xml file that includes an xsl file to render html. Are there any settings on the HtmlDocument class that would assist in this, or do I have to find a way to execute the transformation before loading it with HtmlAgiliyPack? If yes for the latter, anybody know of a good library or method for such a transformation? Below is an example of a website that returns xml with xls file and the code that I would like to use.

var uri = new Uri("http://www.skechers.com/");
var request = (HttpWebRequest)WebRequest.Create(url);
var cookieContainer = new CookieContainer();

request.CookieContainer = cookieContainer;
request.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
request.Method = "GET";
request.AllowAutoRedirect = true;
request.Timeout = 15000;

var response = (HttpWebResponse)request.GetResponse();
var page = new HtmlDocument();
page.OptionReadEncoding = false;
var stream = response.GetResponseStream();
page.Load(stream); 

此代码不会引发任何错误,但是xml是解析的内容,而不是我想要的转换.

This code does not throw any errors, but the xml is what gets parsed and not the transformation, which is what I want.

推荐答案

HTML Agility Pack可为您提供两点帮助:

Html Agility Pack can help you here on two points:

1)使用Xml处理指令更容易,因为它将PI数据解析为Html,因此它将其转换为属性

1) it's easier to get an Xml processing instruction with it as it parses the PI data as Html, so it will transform it into attributes

2)HtmlDocument实现IXPathNavigable,因此可以由.NET Xslt转换引擎直接对其进行转换.

2) HtmlDocument implements IXPathNavigable so it can be transformed directly by the .NET Xslt transformation engine.

这是一段有效的代码.我必须添加一个特定的XmlResover才能正确处理Xslt转换,但是我认为这是特定于此skechers情况的.

Here is a piece of code that works. I had to add a specific XmlResover to handle Xslt transform properly, but I think this is specific to this skechers case.

public static void DownloadAndProcessXml(string url, string userAgent, string outputFilePath)
{
    using (XmlTextWriter writer = new XmlTextWriter(outputFilePath, Encoding.UTF8))
    {
        DownloadAndProcessXml(url, userAgent, writer);
    }
}

public static void DownloadAndProcessXml(string url, string userAgent, XmlWriter output)
{
    UserAgentXmlUrlResolver resolver = new UserAgentXmlUrlResolver(url, userAgent);

    // WebClient is an easy to use class.
    using (WebClient client = new WebClient())
    {
        // download Xml doc. set User-Agent header or the site won't answer us...
        client.Headers[HttpRequestHeader.UserAgent] = resolver.UserAgent;
        HtmlDocument xmlDoc = new HtmlDocument();
        xmlDoc.Load(client.OpenRead(url));

        // determine xslt (note the xpath trick as Html Agility Pack does not support xml processing instructions)
        string xsltUrl = xmlDoc.DocumentNode.SelectSingleNode("//*[name()='?xml-stylesheet']").GetAttributeValue("href", null);

        // download Xslt doc
        client.Headers[HttpRequestHeader.UserAgent] = resolver.UserAgent;
        XslCompiledTransform xslt = new XslCompiledTransform();
        xslt.Load(new XmlTextReader(client.OpenRead(url + xsltUrl)), new XsltSettings(true, false), null);

        // transform Html/Xml doc into new Xml doc, easy as HtmlDocument implements IXPathNavigable
        // note the use of a custom resolver to overcome this Xslt resolve requests
        xslt.Transform(xmlDoc, null, output, resolver);
    }
}

// This class is needed during transformation otherwise there are errors.
// This is probably due to this very specific Xslt file that needs to go back to the root document itself.
public class UserAgentXmlUrlResolver : XmlUrlResolver
{
    public UserAgentXmlUrlResolver(string rootUrl, string userAgent)
    {
        RootUrl = rootUrl;
        UserAgent = userAgent;
    }

    public string RootUrl { get; set; }
    public string UserAgent { get; set; }

    public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
    {
        WebClient client = new WebClient();
        if (!string.IsNullOrEmpty(UserAgent))
        {
            client.Headers[HttpRequestHeader.UserAgent] = UserAgent;
        }
        return client.OpenRead(absoluteUri);
    }

    public override Uri ResolveUri(Uri baseUri, string relativeUri)
    {
        if ((relativeUri == "/") && (!string.IsNullOrEmpty(RootUrl)))
            return new Uri(RootUrl);

        return base.ResolveUri(baseUri, relativeUri);
    }
}

您这样称呼它:

    string url = "http://www.skechers.com/";
    string ua = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
    DownloadAndProcessXml(url, ua, "skechers.html");

这篇关于HtmlAgilityPack可以处理xsl文件随附的xml文件来呈现html吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆