如何prevent System.Xml.XmlException:无效字符在给定的编码 [英] How to prevent System.Xml.XmlException: Invalid character in the given encoding

查看:110
本文介绍了如何prevent System.Xml.XmlException:无效字符在给定的编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C#编写一个Windows桌面应用程序,通过一堆存储在磁盘上,创造了由第三方程序的XML文件循环。大多数的所有文件都加载和LINQ code后面这句话成功处理:

 的XDocument xmlDoc中= XDocument.Load(inFileName);
清单< D​​ocMetaData>文档列表=
      (从xmlDoc.Descendants D(文件)
       选择新DocMetaData
       {
      文件= d.Element(文件)。SafeGetAttributeValue(文件名)
         ,
      文件夹= d.Element(文件夹)。SafeGetAttributeValue(名称)
         ,
      项目编号= d.Elements(「指数」)
          。凡(I =>(串)i.Attribute(名)==项目ID(idmId))
          。选择(I =>(串)i.Attribute(值))
          .FirstOrDefault()
         ,
      注释= d.Elements(「指数」)
          。凡(I =>(串)i.Attribute(名)==评论(idmComment))
          。选择(I =>(串)i.Attribute(值))
          .FirstOrDefault()
         ,
      标题= d.Elements(「指数」)
          。凡(I =>(串)i.Attribute(名)==标题(idmName))
          。选择(I =>(串)i.Attribute(值))
          .FirstOrDefault()
         ,
      DocClass = d.Elements(「指数」)
          。凡(I =>(串)i.Attribute(名)==文档类(idmDocType))
          。选择(I =>(串)i.Attribute(值))
          .FirstOrDefault()
       }
      ).ToList< D​​ocMetaData>();

...其中inFileName是一个完整的路径和文件名,例如:

Y:\\ S2Out \\ B0000004 \\宠物标签\\ convert.B0000004.Pet Tab.xml

但也有一些文件引起这样的问题:

  System.Xml.XmlException:在给定的编码字符无效。行52327,位置126。
在System.Xml.XmlTextReaderImpl.Throw(例外五)
在System.Xml.XmlTextReaderImpl.Throw(字符串资源,字符串ARG)
在System.Xml.XmlTextReaderImpl.InvalidCharRecovery(的Int32&安培; bytesCount,的Int32&安培; charsCount)
在System.Xml.XmlTextReaderImpl.GetChars(的Int32 maxCharsCount)
在System.Xml.XmlTextReaderImpl.ReadData()
在System.Xml.XmlTextReaderImpl.ParseAttributeValueSlow(的Int32 CURPOS,CHAR quoteChar,NODEDATA ATTR)
在System.Xml.XmlTextReaderImpl.ParseAttributes()
在System.Xml.XmlTextReaderImpl.ParseElement()
在System.Xml.XmlTextReaderImpl.ParseElementContent()
在System.Xml.XmlTextReaderImpl.Read()
在System.Xml.Linq.XContainer.ReadContentFrom(XmlReader中R)
在System.Xml.Linq.XContainer.ReadContentFrom(XmlReader中R,LoadOptions O)
在System.Xml.Linq.XDocument.Load(XmlReader中的读者,LoadOptions选项)
在System.Xml.Linq.XDocument.Load(字符串URI,LoadOptions选项)
在System.Xml.Linq.XDocument.Load(字符串URI)
在CBMI.WinFormsUI.GridForm.processFile(StreamWriter的oWriter,字符串inFileName,的Int32 XMLfileNumber)在C:\\ProjectsVS2010\\CBMI.LatitudePostConverter\\CBMI.LatitudePostConverter\\CBMI.WinFormsUI\\GridForm.cs:line 147
在CBMI.WinFormsUI.GridForm.btnProcess_Click(对象发件人,EventArgs e)在C:\\ProjectsVS2010\\CBMI.LatitudePostConverter\\CBMI.LatitudePostConverter\\CBMI.WinFormsUI\\GridForm.cs:line 105

的XML文件看起来像这样(此示例只显示2文档元素,但也有不少):

<?XML版本=1.0&GT?;
< D​​OCUMENTCOLLECTION>
   <文件>
       <文件名=E:\\ S2Out \\ B0000005 \\一般\\ D003712420.0001.pdfoutputpath =E:\\ S2Out \\ B0000005 \\一般/>
       <注解文件名=/>
       < INDEX NAME =评论(idmComment)VALUE =/>
       <指标名称=文档类(idmDocType)VALUE =常规/>
       <指标名称=产品ID(idmId)VALUE =003712420/>
       < INDEX NAME =原文件名(idmDocOriginalFile)VALUE =矩阵对齐603.24标准请愿Pages.pdf/>
       <指标名称=标题(idmName)VALUE =矩阵603.24/>
       <文件夹名=/认证/ PASBVE / 2004-06/>
   < /文件>
   <文件>
       <文件名=E:\\ S2Out \\ B0000005 \\一般\\ D003712442.0001.pdfoutputpath =E:\\ S2Out \\ B0000005 \\一般/>
       <注解文件名=/>
       < INDEX NAME =评论(idmComment)VALUE =/>
       <指标名称=文档类(idmDocType)VALUE =常规/>
       <指标名称=产品ID(idmId)VALUE =003712442/>
       < INDEX NAME =原文件名(idmDocOriginalFile)VALUE =联系人在NDU.pdf/>
       <指标名称=标题(idmName)VALUE =在国防大学联系人/>
       <文件夹名=/认证/ NDU / 2006-12 /自学/>
   < /文件>

在LINQ语句有其自身的复杂性,但我认为它的工作原理确定;它是失败的负载。我看过的各种构造函数的XDocument加载和我研究过有这个异常抛出一些其他的问题,但我感到困惑如何prevent这一点。

最后,在52327线,126位,无法加载该文件中,似乎就行52327这个数据应该不会有问题造成的(最后一个字符是第103位!

<文件名=E:\\ S2Out \\ B0000004 \\宠物标签\\ D003710954.0001.pdfoutputpath = E:\\ S2Out \\ B0000004 \\宠物标签/>


解决方案

为了控制编码(一旦你知道它是什么),可以加载使用加载方法重写一个接受

然后,您可以创建一个新的的StreamReader 对您的文件在构造函数中指定相应的编码

例如,使用西欧编码打开文件,替换code的下面一行的问题:

 的XDocument xmlDoc中= XDocument.Load(inFileName);

本code:

 的XDocument xmlDoc中= NULL;使用(StreamReader的oReader =新的StreamReader(inFileName,Encoding.GetEncoding(ISO-8859-1))){
    xmlDoc中= XDocument.Load(oReader);
}

支持的编码列表可以在 MSDN文档

I have a Windows desktop app written in C# that loops through a bunch of XML files stored on disk and created by a 3rd party program. Most all the files are loaded and processed successfully by the LINQ code that follows this statement:

XDocument xmlDoc = XDocument.Load(inFileName);
List<DocMetaData> docList =
      (from d in xmlDoc.Descendants("DOCUMENT")
       select new DocMetaData
       {
      File = d.Element("FILE").SafeGetAttributeValue("filename")
         ,
      Folder = d.Element("FOLDER").SafeGetAttributeValue("name")
         ,
      ItemID = d.Elements("INDEX")
          .Where(i => (string)i.Attribute("name") == "Item ID(idmId)")
          .Select(i => (string)i.Attribute("value"))
          .FirstOrDefault()
         ,
      Comment = d.Elements("INDEX")
          .Where(i => (string)i.Attribute("name") == "Comment(idmComment)")
          .Select(i => (string)i.Attribute("value"))
          .FirstOrDefault()
         ,
      Title = d.Elements("INDEX")
          .Where(i => (string)i.Attribute("name") == "Title(idmName)")
          .Select(i => (string)i.Attribute("value"))
          .FirstOrDefault()
         ,
      DocClass = d.Elements("INDEX")
          .Where(i => (string)i.Attribute("name") == "Document Class(idmDocType)")
          .Select(i => (string)i.Attribute("value"))
          .FirstOrDefault()
       }
      ).ToList<DocMetaData>();

...where inFileName is a full path and filename such as:

     Y:\S2Out\B0000004\Pet Tab\convert.B0000004.Pet Tab.xml

But a few of the files cause problems like this:

System.Xml.XmlException: Invalid character in the given encoding. Line 52327, position 126.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.InvalidCharRecovery(Int32& bytesCount, Int32& charsCount)
at System.Xml.XmlTextReaderImpl.GetChars(Int32 maxCharsCount)
at System.Xml.XmlTextReaderImpl.ReadData()
at System.Xml.XmlTextReaderImpl.ParseAttributeValueSlow(Int32 curPos, Char quoteChar, NodeData attr)
at System.Xml.XmlTextReaderImpl.ParseAttributes()
at System.Xml.XmlTextReaderImpl.ParseElement()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Load(String uri, LoadOptions options)
at System.Xml.Linq.XDocument.Load(String uri)
at CBMI.WinFormsUI.GridForm.processFile(StreamWriter oWriter, String inFileName, Int32 XMLfileNumber) in C:\ProjectsVS2010\CBMI.LatitudePostConverter\CBMI.LatitudePostConverter\CBMI.WinFormsUI\GridForm.cs:line 147
at CBMI.WinFormsUI.GridForm.btnProcess_Click(Object sender, EventArgs e) in C:\ProjectsVS2010\CBMI.LatitudePostConverter\CBMI.LatitudePostConverter\CBMI.WinFormsUI\GridForm.cs:line 105

The XML files look like this (this sample shows only 2 DOCUMENT elements but there are many):

<?xml version="1.0" ?>
<DOCUMENTCOLLECTION>
   <DOCUMENT>
       <FILE filename="e:\S2Out\B0000005\General\D003712420.0001.pdf" outputpath="e:\S2Out\B0000005\General"/>
       <ANNOTATION filename=""/>
       <INDEX name="Comment(idmComment)" value=""/>
       <INDEX name="Document Class(idmDocType)" value="General"/>
       <INDEX name="Item ID(idmId)" value="003712420"/>
       <INDEX name="Original File Name(idmDocOriginalFile)" value="Matrix Aligning 603.24 Criteria to Petition Pages.pdf"/>
       <INDEX name="Title(idmName)" value="Matrix for 603.24"/>
       <FOLDER name="/Accreditation/PASBVE/2004-06"/>
   </DOCUMENT>
   <DOCUMENT>
       <FILE filename="e:\S2Out\B0000005\General\D003712442.0001.pdf" outputpath="e:\S2Out\B0000005\General"/>
       <ANNOTATION filename=""/>
       <INDEX name="Comment(idmComment)" value=""/>
       <INDEX name="Document Class(idmDocType)" value="General"/>
       <INDEX name="Item ID(idmId)" value="003712442"/>
       <INDEX name="Original File Name(idmDocOriginalFile)" value="Contacts at NDU.pdf"/>
       <INDEX name="Title(idmName)" value="Contacts at NDU"/>
       <FOLDER name="/Accreditation/NDU/2006-12/Self-Study"/>
   </DOCUMENT>

The LINQ statements have their own complexities but I think it works OK; it is the LOAD that fails. I have looked at the various constructors for XDocument Load and I've researched some other questions having this Exception thrown but I am confused about how to prevent this.

Lastly, at line 52327, position 126, in the file that failed to load, it appears that this data on line 52327 should NOT have caused the problem (and the last character is at position 103!

<FILE filename="e:\S2Out\B0000004\Pet Tab\D003710954.0001.pdf" outputpath="e:\S2Out\B0000004\Pet Tab"/>

解决方案

In order to control the encoding (once you know what it is), you can load the files using the Load method override that accepts a Stream.

Then you can create a new StreamReader against your file specifying the appropriate Encoding in the constructor.

For example, to open the file using Western European encoding, replace the following line of code in the question:

XDocument xmlDoc = XDocument.Load(inFileName);

with this code:

XDocument xmlDoc = null;

using (StreamReader oReader = new StreamReader(inFileName, Encoding.GetEncoding("ISO-8859-1"))) {
    xmlDoc = XDocument.Load(oReader);
}

The list of supported encodings can be found in the MSDN documentation.

这篇关于如何prevent System.Xml.XmlException:无效字符在给定的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆