在加载XHTML文档时,需要在XmlDocument.Load中解决HTTP 503问题 [英] Need a workaround for the HTTP 503 problem in XmlDocument.Load when loading a XHTML document

查看:76
本文介绍了在加载XHTML文档时,需要在XmlDocument.Load中解决HTTP 503问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨!

我想编写一个应用程序(最后它应该是SSIS脚本任务的一部分)从中加载XHTML文档一个驱动器并操纵它(在内存中)。问题是XmlDocument类的Load方法,它抛出"503 Service unavailable"。调用
。我发现原因是W3.org网站阻止了某些请求(见[1])。我在[2]找到了一个解决方案,但这不起作用(在我的情况下?)因为" - // W3C // DTD XHTML 1.0 Transitional // EN"部分出问题。
在行中"return xur.GetEntity(absoluteUri,role,t);"出现此错误:

I want to write an application (finally it should be part of a SSIS script task) that loads a XHTML document from a drive and manipulates it (in memory). The problem is the Load method of the XmlDocument class, which throws a "503 Service unavailable" on invocation. I found out that the reason is that the W3.org website blocks certain requests (see [1]). I found a solution then at [2], but this doesn't work (in my case?) because the "-//W3C//DTD XHTML 1.0 Transitional//EN" part makes problems. In the line "return xur.GetEntity(absoluteUri, role, t);" there occurs this error:

Ein Teil des Pfades" U:\ Eigene Dateien\Visual Studio 2005 \Projects\Mail2CL\bin\Debug \-\ W3C \DTD XHTML 1.0 Transitional\EN" konnte nicht gefunden werden。

Ein Teil des Pfades "U:\Eigene Dateien\Visual Studio 2005\Projects\Mail2CL\bin\Debug\-\W3C\DTD XHTML 1.0 Transitional\EN" konnte nicht gefunden werden.

意思是:"路径的一部分"......"无法找到。"

Means something like: "A part of the path "...." could not be found."

这很清楚,因为路径确实不存在。但正如你所看到的,我只是转发了xur.ResolveUri的返回值,我认为应该没问题。

Which is clear, because the path doesn't really exist. But as you can see, I just forwarded the return value of xur.ResolveUri which I believed should be okay.

对我不起作用的解决方案:

Solutions that don't work for me:


  • 将XmlResolver设置为null或删除DTD
  • 更改代理设置(我对我们企业的代理没有影响)

任何人都可以帮助我吗?

Can anyone help me?

祝你好运

迈克尔

[1] http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

[1] http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

[2] http://stackoverflow.com/questions/ 2766357 / how-to-validate-xml-using-a-dtd-via-a-proxy-and-not-using-system-net-defaultprox

[2] http://stackoverflow.com/questions/2766357/how-to-validate-xml-using-a-dtd-via-a-proxy-and-not-using-system-net-defaultprox

using System;
using System.Collections.Generic;
using System.Net.Mail;
using System.Text;
using System.Xml;
using System.Diagnostics;
using System.IO;

namespace Mail2CL
{
  class XhtmlResolver : XmlResolver
  {
    private static readonly Dictionary<String, String> knownDtds;

    static XhtmlResolver()
    {
      knownDtds = new Dictionary<String, String>();
      knownDtds.Add("xhtml", "http://www.w3.org/TR/xhtml1/DTD/");
    }

    public override System.Net.ICredentials Credentials
    {
      set { throw new NotSupportedException(); }
    }

    public override object GetEntity(Uri absoluteUri, string role, Type t)
    {
      if (t != typeof(System.IO.Stream))
      {
        throw new ArgumentException();
      }

      if (absoluteUri == null)
      {
        //throw new ArgumentException();
        MemoryStream ms = new MemoryStream(new byte[] { 0 });
        return ms;
      }

      string uri = absoluteUri.AbsoluteUri;
      foreach (string key in knownDtds.Keys)
      {
        if (uri.StartsWith(knownDtds[key]))
        {
          string resourceName = uri.Replace(knownDtds[key], @"\\path_to_dtd\"); //GetResourceName(key, uri.Substring(knownDtds[key].Length));
          return GetStreamForNamedResource(resourceName);
        }
      }

      return xur.GetEntity(absoluteUri, role, t);
      //throw new ArgumentException();
      //return null;
    }

    private String GetResourceName(string key, string filename)
    {
      if (filename.StartsWith(key)) return filename;
      return key + "-" + filename;
    }

    private Stream GetStreamForNamedResource(string resourceName)
    {
      Debug.Print(resourceName);
      return new FileStream(resourceName, FileMode.Open, FileAccess.Read);
    }

    XmlUrlResolver xur = new XmlUrlResolver();

    public override Uri ResolveUri(Uri baseUri, string relativeUri)
    {
      if (baseUri == null)
      {
        if (relativeUri.StartsWith("http://") || relativeUri.Substring(1, 1) == ":" )
        {
          Trace(" returning {0}", relativeUri);
          return new Uri(relativeUri);
        }
        else
        {
          Debug.Print((baseUri == null ? "baseUri is null\n" : baseUri.ToString()) + relativeUri);
          Uri uriX = xur.ResolveUri(null, relativeUri);
          return uriX;
        }
        //throw new ArgumentException();
        //Trace(" returning base.ResolveUri()");
        //return base.ResolveUri(baseUri, relativeUri);
      }

      if (relativeUri == null)
      {
        Trace(" returning base [{0}]", baseUri.ToString());
        return baseUri;
      }

      // both are non-null
      string uri = baseUri.AbsoluteUri;
      foreach (string key in knownDtds.Keys)
      {
        string dtdUriRoot = knownDtds[key];
        if (uri.StartsWith(dtdUriRoot))
        {
          string newUri = uri.Substring(0, dtdUriRoot.Length) + relativeUri;
          Trace(" returning [{0}]", newUri);
          return new Uri(newUri);
        }
      }

      throw new ArgumentException();
    }

    private void Trace(string p, params object[] args)
    {
      Debug.Print(p, args);
    }
  }

  class Program
  {
    static void Main(string[] args)
    {
      XmlDocument mail2CL = new XmlDocument();
      //mail2CL.Load(@"c:\temp\mini.xml");
      //mail2CL.XmlResolver = new XhtmlResolver();
      XmlReaderSettings xmlReaderSettings = new XmlReaderSettings();
      xmlReaderSettings.ProhibitDtd = false;
      xmlReaderSettings.XmlResolver = new XhtmlResolver();
      using (FileStream stream = File.OpenRead(@"c:\temp\Mail_2CL.html"))
      {
        XmlReader reader = XmlReader.Create(stream, xmlReaderSettings);
        mail2CL.Load(reader);
      }
      //mail2CL.Load(@"c:\temp\Mail_2CL.html");
      string str2CLMonat = DateTime.Now.AddMonths(-13).ToString("yyyy.MM");
      XmlNode nodeMonat2CL = mail2CL.GetElementById("monat2cl");
      nodeMonat2CL.InnerText = str2CLMonat;
      //mail2CL.Load(@"c:\temp\mini.xml");
    }
  }
}

推荐答案

我对你要找的东西有点困惑。 stackoverflow的答案是如何解决这个问题的一个很好的指导。它遗漏的唯一部分是在哪里获得DTD。

I'm a bit confused on what you're looking for. The answer of stackoverflow is an excelent guidance on how to solve this problem. The only part which it leaves out is where to get the DTD in question.

在上面的代码中有几个我不明白的地方。在某些情况下,您可以回退到基类实现。我没有看到任何理由这样做,但这不是主要问题。

In your code above there are several places which I don't understand. You do fallback to the base class implementation in certain cases in others you don't. I don't see any reason to do that, but that's not the main problem.

你看到Visual Studio 2005目录的奇怪路径的原因是默认的实现ResolveUri(如果基URI为null,在这种情况下)将使用当前目录作为基URI。这就是你如何到达那里。

The reason why you see the strange path to the Visual Studio 2005 directories is that the default implementation of the ResolveUri (if the base URI is null, which is this case) will use the current directory as the base URI. So that's how you get there.

其次你实际上正确地认识到XHTML DTD是已知的dtd之一,但是你尝试打开一个路径为
\\path_to_dtd\thexhtmldtd.dtd 。除非您的网络上有一个名为path_to_dtd的文件服务器,否则显然无法正常工作。正如stackoverflow所述,回答存储已知DTD的一种很好的方法
是将它们存储为程序集中的嵌入资源(这样您就不必拥有可在运行应用程序时访问的公共路径) 。我会尝试这样做。

Second you actually do correctly recognize the XHTML DTD as one of the known dtds but then you try to open a file with path \\path_to_dtd\thexhtmldtd.dtd. Unless you have a file server called path_to_dtd on your network that is obviously not going to work. As noted by the stackoverflow answer a great way of storing the known DTDs is to store them as embeded resources in your assembly (so that you don't have to have a common path which is accesible when you run your application). I would try to do that.

谢谢,


这篇关于在加载XHTML文档时,需要在XmlDocument.Load中解决HTTP 503问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆