在 text/xml 值中编码 CR-LF 换行符的正确方法是什么? [英] What's the correct way to encode CR-LF line breaks in text/xml values?

查看:20
本文介绍了在 text/xml 值中编码 CR-LF 换行符的正确方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与可以做任何事情的 application/xml 文件或将所有空白序列转换为单个空格字符的 normalizedString 值相反,我在这里专门询问具有字符串值的 text/xml 文件的上下文.为简单起见,假设我仅将 ASCII 字符与 UTF8 编码文件一起使用.

As opposed to application/xml files which could do anything, or normalizedString values which convert all whitespace sequences to a single space character, I'm asking here specifically in the context of text/xml files with string values. For the sake of simplicity, let's say I'm only using ASCII characters with a UTF8 encoded file.

鉴于我希望用 XML 表示的以下两行文本字符串:

Given the following two-line text string I wish to represent in XML:

Hello
World!

哪些是内存中的以下字节:

Which is the following bytes in memory:

0000: 48 65 6c 6c 6f 0d 0a 57 6f 72 6c 64 21 Hello..World!

根据 RFC 2046,任何 text/* MIME 类型必须(不应该)表示使用回车符后跟换行符字符序列的换行符.从这个角度来看,以下 XML 片段应该是正确的:

According to RFC 2046, any text/* MIME type MUST (not should) represent a line break using Carriage Return followed by Linefeed character sequence. In that light, the following XML fragment should be right:

<tag>Hello
World!</tag>

0000: 3c 74 61 67 3c 48 65 6c 6c 6f 0d 0a 57 6f 72 6c <tag>Hello..Worl
0010: 64 21 3c 2f 74 61 67 3c                         d!</tag>

但我经常看到如下文件:

But I regularly see files like the following:

<tag><![CDATA[Hello
World!]]></tag>

或者,甚至陌生人:

<tag>Hello&xD;
World!</tag>

&0xD;序列后跟一个换行符:

Where the &0xD; sequence is followed by a single Linefeed character:

0000: 3c 74 61 67 3c 48 65 6c 6c 6f 26 78 44 3b 0a 57 <tag>Hello&xD;.W
0010: 6f 72 6c 64 21 3c 2f 74 61 67 3c                orld!</tag>

我在这里错过了什么?在 XML 字符串值中表示多行文本以便它可以不受干扰地从另一端出来的正确方法是什么?

What am I missing here? What's the correct way to represent multiple lines of text in an XML string value so that it can come out the other end unmolested?

推荐答案

在使用 Mono 编写 NUnit 测试和使用 Java 编写 JUnit 测试之后,答案似乎是使用 <tag>Hello&#13;\nWorld!</tag> 或 <tag>Hello&#xd;\nWorld!</tag> 如下...

After writing NUnit tests in Mono and JUnit tests in Java, the answer would appear to be to use either <tag>Hello&#13;\nWorld!</tag> or <tag>Hello&#xd;\nWorld!</tag> as below...

Foo.cs:

using System.IO;
using System.Text;
using System.Xml.Serialization;

namespace XmlStringTests
{
    public class Foo
    {
        public string greeting;

        public static Foo DeserializeFromXmlString (string xml)
        {
            Foo result;
            using (MemoryStream memoryStream = new MemoryStream()) {
                byte[] buffer = Encoding.UTF8.GetBytes (xml);
                memoryStream.Write (buffer, 0, buffer.Length);
                memoryStream.Seek (0, SeekOrigin.Begin);
                XmlSerializer xs = new XmlSerializer (typeof(Foo));
                result = (Foo)xs.Deserialize (memoryStream);
            }
            return result;
        }
    }
}

XmlStringTests.cs:

XmlStringTests.cs:

using NUnit.Framework;

namespace XmlStringTests
{
    [TestFixture]
    public class XmlStringTests
    {
        const string expected = "Hello\u000d\u000aWorld!";

        [Test(Description="Fails")]
        public void Cdata ()
        {
            const string test = "<Foo><greeting><![CDATA[Hello\u000d\u000aWorld!]]></greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }

        [Test(Description="Fails")]
        public void CdataWithHash13 ()
        {
            const string test = "<Foo><greeting><![CDATA[Hello&#13;\u000aWorld!]]></greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }

        [Test(Description="Fails")]
        public void CdataWithHashxD ()
        {
            const string test = "<Foo><greeting><![CDATA[Hello&#xd;\u000aWorld!]]></greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }

        [Test(Description="Fails")]
        public void Simple ()
        {
            const string test = "<Foo><greeting>Hello\u000d\u000aWorld!</greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }

        [Test(Description="Passes")]
        public void SimpleWithHash13 ()
        {
            const string test = "<Foo><greeting>Hello&#13;\u000aWorld!</greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }

        [Test(Description="Passes")]
        public void SimpleWithHashxD ()
        {
            const string test = "<Foo><greeting>Hello&#xd;\u000aWorld!</greeting></Foo>";
            Foo bar = Foo.DeserializeFromXmlString (test);
            Assert.AreEqual (expected, bar.greeting);
        }
    }
}

Foo.java:

import java.io.StringReader;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

@XmlRootElement(name = "Foo")
@XmlType(propOrder = { "greeting" })
public class Foo {
    public String greeting;

    public static Foo DeserializeFromXmlString(String xml) {
        try {
            JAXBContext context = JAXBContext.newInstance(Foo.class);
            Unmarshaller unmarshaller = context.createUnmarshaller();
            Foo foo = (Foo) unmarshaller.unmarshal(new StringReader(xml));
            return foo;
        } catch (JAXBException e) {
            e.printStackTrace();
            return null;
        }
    }
}

XmlStringTests.java:

XmlStringTests.java:

import static org.junit.Assert.*;
import org.junit.Test;


public class XmlStringTests {
    String expected = "Hello\r\nWorld!";

    @Test //Fails
    public void testCdata ()
    {
        String test = "<Foo><greeting><![CDATA[Hello\r\nWorld!]]></greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }

    @Test //Fails
    public void testCdataWithHash13 ()
    {
        String test = "<Foo><greeting><![CDATA[Hello&#13;\nWorld!]]></greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }

    @Test //Fails
    public void testCdataWithHashxD ()
    {
        String test = "<Foo><greeting><![CDATA[Hello&#xd;\nWorld!]]></greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }

    @Test //Fails
    public void testSimple ()
    {
        String test = "<Foo><greeting>Hello\r\nWorld!</greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }

    @Test //Passes
    public void testSimpleWithHash13 ()
    {
        String test = "<Foo><greeting>Hello&#13;\nWorld!</greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }

    @Test //Passes
    public void testSimpleWithHashxD ()
    {
        String test = "<Foo><greeting>Hello&#xd;\nWorld!</greeting></Foo>";
        Foo bar = Foo.DeserializeFromXmlString (test);
        assertEquals (expected, bar.greeting);
    }
}

我希望这可以为一些人节省一些时间.

I hope this saves some people some time.

这篇关于在 text/xml 值中编码 CR-LF 换行符的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆