XmlReader在UTF-8 BOM上中断 [英] XmlReader breaks on UTF-8 BOM

查看：97 发布时间：2020/7/13 2:48:42 c# utf-8 xmlreader byte-order-mark

本文介绍了XmlReader在UTF-8 BOM上中断的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的应用程序中包含以下XML解析代码:

I have the following XML Parsing code in my application:

    public static XElement Parse(string xml, string xsdFilename)
    {
        var readerSettings = new XmlReaderSettings
        {
            ValidationType = ValidationType.Schema,
            Schemas = new XmlSchemaSet()
        };
        readerSettings.Schemas.Add(null, xsdFilename);
        readerSettings.ValidationFlags |= XmlSchemaValidationFlags.ProcessInlineSchema;
        readerSettings.ValidationFlags |= XmlSchemaValidationFlags.ProcessSchemaLocation;
        readerSettings.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings;
        readerSettings.ValidationEventHandler +=
            (o, e) => { throw new Exception("The provided XML does not validate against the request's schema."); };

        var readerContext = new XmlParserContext(null, null, null, XmlSpace.Default, Encoding.UTF8);

        return XElement.Load(XmlReader.Create(new StringReader(xml), readerSettings, readerContext));
    }

我正在使用它将发送到WCF服务的字符串解析为XML文档，以进行自定义反序列化.

I am using it to parse strings sent to my WCF service into XML documents, for custom deserialization.

当我读入文件并通过电线发送它们(请求)时，它可以正常工作；我已经验证了物料清单没有发送出去.在我的请求处理程序中，我正在序列化一个响应对象，并将其作为字符串发送回.序列化过程会在字符串的前面添加一个UTF-8 BOM，这会导致在解析响应时打破相同的代码.

It works fine when I read in files and send them over the wire (the request); I've verified that the BOM is not sent across. In my request handler I'm serializing a response object and sending it back as a string. The serialization process adds a UTF-8 BOM to the front of the string, which causes the same code to break when parsing the response.

System.Xml.XmlException : Data at the root level is invalid. Line 1, position 1.

在过去一个小时左右的时间里，我完成了一项研究，看来XmlReader应该尊重BOM.如果我从字符串的前面手动删除BOM，则响应xml解析良好.

In the research I've done over the last hour or so, it appears that XmlReader should honor the BOM. If I manually remove the BOM from the front of the string, the response xml parses fine.

我错过了明显的东西，或者至少是阴险的东西吗?

Am I missing something obvious, or at least something insidious?

这是我用来返回响应的序列化代码:

Here is the serialization code I'm using to return the response:

private static string SerializeResponse(Response response)
{
    var output = new MemoryStream();
    var writer = XmlWriter.Create(output);
    new XmlSerializer(typeof(Response)).Serialize(writer, response);
    var bytes = output.ToArray();
    var responseXml = Encoding.UTF8.GetString(bytes);
    return responseXml;
}

如果只是XML错误地包含BOM表的问题，那么我将切换到

If it's just a matter of the xml incorrectly containing the BOM, then I'll switch to

var responseXml = new UTF8Encoding(false).GetString(bytes);

，但是从我的研究中并不能完全清楚，BOM在实际的XML字符串中是非法的.参见例如 c#从字节数组检测xml编码?

but it was not clear at all from my research that the BOM was illegal in the actual XML string; see e.g. c# Detect xml encoding from Byte Array?

推荐答案

xml字符串不能(！)包含BOM，BOM仅允许使用UTF-8编码的字节数据(例如流).这是因为未对字符串表示形式进行编码，而是已经对Unicode字符序列进行了编码.

The xml string must not (!) contain the BOM, the BOM is only allowed in byte data (e.g. streams) which is encoded with UTF-8. This is because the string representation is not encoded, but already a sequence of unicode characters.

因此，您似乎错误地加载了字符串，而不幸的是您没有提供该代码.

It therefore seems that you load the string wrong, which is in code you unfortunatley didn't provide.

感谢发布序列化代码.

您不应将数据写入MemoryStream，而应写入StringWriter，然后可以使用ToString将其转换为字符串.由于这样可以避免传递字节表示形式，因此不仅速度更快，而且还避免了此类问题.

You should not write the data to a MemoryStream, but rather to a StringWriter which you can then convert to a string with ToString. Since this avoids passing through a byte representation it is not only faster but also avoids such problems.

类似这样的东西:

private static string SerializeResponse(Response response)
{
    var output = new StringWriter();
    var writer = XmlWriter.Create(output);
    new XmlSerializer(typeof(Response)).Serialize(writer, response);
    return output.ToString();
}

这篇关于XmlReader在UTF-8 BOM上中断的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

XmlReader在UTF-8 BOM上中断 [英] XmlReader breaks on UTF-8 BOM

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

XmlReader在UTF-8 BOM上中断 [英] XmlReader breaks on UTF-8 BOM

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭