该无懈可击XMLException [英] The Invulnerable XMLException

查看:365
本文介绍了该无懈可击XMLException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我序列化一个非常大的名单,其中,串> 用这个code:

I serialize a very large List<string> using this code:

public static string SerializeObjectToXML<T>(T item)
{
    XmlSerializer xs = new XmlSerializer(typeof(T));
    using (StringWriter writer = new StringWriter())
    {
        xs.Serialize(writer, item);
        return writer.ToString();
    }
}

和使用这种code反序列化:

And deserialize it using this code:

public static T DeserializeXMLToObject<T>(string xmlText)
{
    if (string.IsNullOrEmpty(xmlText)) return default(T);
    XmlSerializer xs = new XmlSerializer(typeof(T));
    using (MemoryStream memoryStream = new MemoryStream(new UnicodeEncoding().GetBytes(xmlText.Replace((char)0x1A, ' '))))
    using (XmlTextReader xsText = new XmlTextReader(memoryStream))
    {
        xsText.Normalization = true;
        return (T)xs.Deserialize(xsText);
    }
}

但是,当我反序列化它,我得到这个异​​常:

But I get this exception when I deserialize it:

XMLException :有一个在XML文档(217388,15)中的错误。 [],十六进制值0x1A的,是无效字符。行217388,第15位。

XMLException: There is an error in XML document (217388, 15). '[]', hexadecimal value 0x1A, is an invalid character. Line 217388, position 15.

在System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader的XmlReader的,字符串的encodingStyle,XmlDeserializationEvents事件)

在System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader的的XmlReader)

为什么 xmlText.Replace((炭)0x1A的,'') 行不工作,是什么魅力是什么?

Question

Why is the xmlText.Replace((char)0x1A, ' ') line not working, what witchery is this?

  • 在我的code是在C#中,框架4,建于VS2010专业。
  • 我无法在调试模式下查看XMLTEXT的价值,因为名单,其中,串&GT; 太大,手表窗口只显示了无法评估前pression。没有足够的存储可用于完成此操作。的错误消息。
  • My code is in C#, framework 4, built in VS2010 Pro.
  • I can't view the value of xmlText in debug mode because the List<string> is too big and the watch windows just displays the Unable to evaluate the expression. Not enough storage is available to complete this operation. error message.

推荐答案

我想我已经找到了问题。默认情况下,的XmlSerializer 将让你产生无效的XML。

I think I've found the problem. By default, XmlSerializer will allow you to generate invalid XML.

由于code:

var input = "\u001a";

var writer = new StringWriter();
var serializer = new XmlSerializer(typeof(string));
serializer.Serialize(writer, input);

Console.WriteLine(writer.ToString());

的输出是:

<?xml version="1.0" encoding="utf-16"?>
<string>&#x1A;</string>

这是无效的XML。根据XML规范,所有字符引用必须是字符,这是有效的。有效字符是:

This is invalid XML. According to the XML specification, all character references must be to characters which are valid. Valid characters are:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

正如你所看到的,U + 001A(和所有其他的C0 / C1控制字符)的没有的允许作为参考,因为它们是无效的字符。

As you can see, U+001A (and all other C0/C1 control characters) are not allowed as references, since they are not valid characters.

由德codeR给出的错误信息是有点误导,并会更清楚,如果说有一个无效的字符引用的。

The error message given by the decoder is a bit misleading, and would be clearer if it said that there was an invalid character reference.

有几种方法你可以做什么。

There are several options for what you can do.

您可以使用的XmlWriter ,默认情况下将不允许无效字符:

You can use an XmlWriter, which by default will not allow invalid characters:

var input = "\u001a";

var writer = new StringWriter();
var serializer = new XmlSerializer(typeof(string));

// added following line:
var xmlWriter = XmlWriter.Create(writer);

// then, write via the xmlWriter rather than writer:
serializer.Serialize(xmlWriter, input);

Console.WriteLine(writer.ToString());

在序列化时,将抛出异常。这将必须被处理和显示一个适当的错误。

This will throw an exception when the serialization occurs. This will have to be handled and an appropriate error shown.

这可能是不适合你用,因为你的数据已经存储这些无效字符。

This probably isn't useful for you because you have data already stored with these invalid characters.

也就是说,不是 .Replace((炭)0x1A的,''),这是不实际的时刻更换任何文档中,使用 .Replace(&放大器;#X1A;,)。 (这不是不区分大小写的,但它确实是.NET生成。更可靠的解决方案是使用不区分大小写的正则表达式。)

That is, instead of .Replace((char)0x1a, ' '), which isn't actually replacing anything in your document at the moment, use .Replace("&#x1A;", " "). (This isn't case-insensitive, but it is what .NET generates. A more robust solution would be to use a case-insensitive regex.)

顺便说一句,XML 1.1实际上允许的引用,以控制字符,只要它们是在文档中的引用,而不是普通的字符。这将解决您的问题,除了事实。NET XmlSerializer的不支持1.1版本。

As an aside, XML 1.1 actually allows references to control characters, as long as they are references and not plain characters in the document. This would solve your problem apart from the fact that the .NET XmlSerializer doesn't support version 1.1.

这篇关于该无懈可击XMLException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆