XML(反)序列无效的字符串在C#中不一致? [英] XML (de)serialization invalid string inconsistent in c#?

查看:380
本文介绍了XML(反)序列无效的字符串在C#中不一致?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C#(.NET 4.0和4.5 / VS2010和VS12)当我序列含有使用XMLSerializer的非法字符的字符串对象,不会引发错误。然而,当我反序列化的结果,无效字符错误被抛出。

In C# (.net 4.0 and 4.5 / vs2010 and vs12) when I serialize an object containing a string having an illegal character using XMLSerializer, no error is thrown. However, when I deserialize that result, an "invalid character" error is thrown.

        // add to XML
        Items items = new Items();
        items.Item = "\v hello world"; // contains "illegal" character \v

        // variables
        System.Xml.Serialization.XmlSerializer serializer = new System.Xml.Serialization.XmlSerializer(typeof(Items));
        string tmpFile = Path.GetTempFileName();

        // serialize
        using (FileStream tmpFileStream = new FileStream(tmpFile, FileMode.Open, FileAccess.ReadWrite))
        {
            serializer.Serialize(tmpFileStream, items);
        }
        Console.WriteLine("Success! XML serialized in file " + tmpFile);

        // deserialize
        Items result = null;
        using (FileStream plainTextFile = new FileStream(tmpFile, FileMode.Open, FileAccess.Read))
        {
            result = (Items)serializer.Deserialize(plainTextFile); //FAILS here
        }

        Console.WriteLine(result.Item);



项目是只是XSD / C Items.xsd自动生成一个小班。 Items.xsd无非是根元素更包含一个孩子(项目)(项目):

"Items" is just a small class autogenerated by xsd /c Items.xsd. Items.xsd is nothing more than a root element (Items) containing one child (Item):

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xs:element name="Items">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="Item" type="xs:string" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>



在反序列化过程中引发的错误是

The error thrown during deserialization is

未处理的异常:System.InvalidOperationException:有XML文档(3,12)在
错误。 ---> System.Xml.XmlException:'♂',
十六进制值0x0B中,是一个无效字符。 3号线,12位

Unhandled Exception: System.InvalidOperationException: There is an error in XML document (3, 12). ---> System.Xml.XmlException: '♂', hexadecimal value 0x0B, is an invalid character. Line 3, position 12.

序列化的XML文件包含在第3行这样的:

The serialized XML file contains on line 3 this:

<Item>&#xB; hello world</Item>



我知道\v - >&安培; #的xB;是非法字符,但为什么XMLSERIALIZE允许它被序列化(没有错误)?我觉得很不一致的.NET它让我序列化的东西没有问题才发现,我不能反序列化的。

I know \v -> & # xB; is an illegal character, but why does XMLSerialize allows it to be serialized (without error)? I find it inconsistent of .NET that it allows me to serialize something without a problem only to find out that I cannot deserialize it.

有没有解决方案,因此无论是XmlSerializer的序列化之前自动删除非法字符或者我可以指示反序列化无视非法字符?

Is there a solution so either the XMLSerializer removes the illegal characters automatically before serializing or can I instruct the deserialization to ignore the illegal characters?

目前我通过读取文件内容作为字符串解决它,替换手动的非法字符和明年反序列化......但我发现一个丑陋的黑客/变通。

Currently I do solve it by reading the file contents as a string, replacing "manually" the illegal characters and next deserialize it... but I find that an ugly hack/work around.

推荐答案

您可以设置 XmlWriterSettings CheckCharacters 属性,以避免编写非法字符。(序列化方法会抛出异常)

1.

You can set XmlWriterSettings's CheckCharacters property to avoid writing illegal chars.(Serialize method would throw exception)

using (FileStream tmpFileStream = new FileStream(tmpFile, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
    var writer = XmlWriter.Create(tmpFileStream, new XmlWriterSettings() { CheckCharacters = true});
    serializer.Serialize(writer, items);
}



2



您可以创建自己的XmlTextWriter来过滤掉不需要的字符,而序列

2.

You can create your own XmlTextWriter to filter out unwanted chars while serializing

using (FileStream tmpFileStream = new FileStream(tmpFile, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
    var writer = new MyXmlWriter(tmpFileStream);
    serializer.Serialize(writer, items);
}

public class MyXmlWriter : XmlTextWriter
{
    public MyXmlWriter(Stream s) : base(s, Encoding.UTF8)
    {
    }

    public override void WriteString(string text)
    {
        string newText = String.Join("", text.Where(c => !char.IsControl(c)));
        base.WriteString(newText);
    }
}



3。



通过创建自己的XmlTextReader可以过滤掉不想要的字符反序列化

3.

By creating your own XmlTextReader you can filter out unwanted chars while deserializing

using (FileStream plainTextFile = new FileStream(tmpFile, FileMode.Open, FileAccess.Read))
{
    var reader = new MyXmlReader(plainTextFile);
    result = (SomeObject)serializer.Deserialize(reader); 
}

public class MyXmlReader : XmlTextReader
{
    public MyXmlReader(Stream s) : base(s)
    {
    }

    public override string ReadString()
    {
        string text =  base.ReadString();
        string newText = String.Join("", text.Where(c => !char.IsControl(c)));
        return newText;
    }
}



4。



您可以设置 XmlReaderSettings CheckCharacters 属性设置为false。反序列化将现在的工作顺利进行。 (你会得到 \v 回来。)

4.

You can set XmlReaderSettings's CheckCharacters property to false. Deserialization will work now smoothly. (you'll get \v back.)

using (FileStream plainTextFile = new FileStream(tmpFile, FileMode.Open, FileAccess.Read))
{
    var reader = XmlReader.Create(plainTextFile, new XmlReaderSettings() { CheckCharacters = false });
    result = (SomeObject)serializer.Deserialize(reader); 
}

这篇关于XML(反)序列无效的字符串在C#中不一致?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆