反序列化XML时忽略指定的编码 [英] Ignoring specified encoding when deserializing XML

查看:190
本文介绍了反序列化XML时忽略指定的编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取通过套接字从外部接口接收到的一些XML. 问题是在XML标头中指定的编码错误(它表示iso-8859-1,但它是utf-16BE).据记载,该编码为utf-16BE,但显然他们忘记了设置正确的编码.

I am trying to read some XML received from an external interface over a socket. The problem is that the encoding is specified wrong in the XML-header (it says iso-8859-1, but it is utf-16BE). It is documented that the encoding is utf-16BE, but apparently they forgot to set the correct encoding.

要在反序列化时忽略编码,我使用像这样的StringReader:

To ignore the encoding when I deserialize I use a StringReader like this:

    private static T DeserializeXmlData<T>(byte[] xmlData)
    {
        var xmlString = Encoding.BigEndianUnicode.GetString(xmlData);
        using (var reader = new StringReader(xmlString))
        {
            reader.ReadLine(); // Eat header line
            using (var xmlReader = XmlReader.Create(reader))
            {
                var serializer = new XmlSerializer(typeof(T));
                return (T)serializer.Deserialize(xmlReader);
            }
        }
    }

上面的方法实际上工作正常,但是我不喜欢我通过调用ReadLine跳过标题行的部分. 是否有一种不太灵活的方法来绕过XML标头中指定的编码?

The above actually works fine, but I don't like the part where I just skip the header line by calling ReadLine. Is there a less brittle way to bypass the encoding specified in the XML-header?

StreamReader解决方案

通过使用StreamReader,我可以覆盖XML标头中指定的编码.是否指定XmlReaderSettings.IgnoreProcessingInstructions并没有任何区别. 有趣的是,StreamReader如果发现一个Unicode字节顺序标记,则会忽略指定的编码.

By using a StreamReader, I can override the encoding specified in the XML-header. Specifying XmlReaderSettings.IgnoreProcessingInstructions or not did not do any difference. Interestingly the StreamReader ignores the specified encoding if it finds a unicode byte-order mark.

回顾:

  • 如果XmlReader使用TextReader初始化,则XML标头编码将被忽略.
  • 如果使用StringReader,则如果存在Unicode字节顺序标记,则XmlReader失败.
  • 如果使用StreamReader,则Unicode字节顺序标记将覆盖StreamReader编码.
  • XmlReaderSettings.IgnoreProcessingInstructions = true在使用TextReader时没有区别.

最后,最可靠的解决方案似乎是使用StreamReader,因为它使用字节顺序标记(如果存在).

In conclusion, the most robust solution seems to be using a StreamReader, since it uses the byte-order mark, if present.

    private static T DeserializeXmlData<T>(byte[] xmlData)
    {
        using (var xmlDataStream = new MemoryStream(xmlData))
        {
            using (var reader = new StreamReader(xmlDataStream, Encoding.BigEndianUnicode))
            {
                using (var xmlReader = XmlReader.Create(reader))
                {
                    var serializer = new XmlSerializer(typeof (T));
                    return (T) serializer.Deserialize(xmlReader);
                }
            }
        }
    }

推荐答案

我想我只是使用StreamReader,它以正确的编码构造并将其传递给XmlReader.Create(TextStream)方法:

I think I'd just use a StreamReader, constructed with the right encoding and pass that to the XmlReader.Create(TextStream) method:

 using (var sr = new StreamReader(@"c:\temp\bad.xml", Encoding.BigEndianUnicode)) {
     using (var xr = XmlReader.Create(sr, new XmlReaderSettings())) {
         // etc...
     }
 }

这篇关于反序列化XML时忽略指定的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆