保留 XML 中的文本格式 [英] Retaining text formatting in XML

查看:39
本文介绍了保留 XML 中的文本格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据要存储在 XML 文件中.(它不一定是 XML,但 XML 是一种不错的开放格式.)

I have some data that I would like to store in an XML file. (It doesn't have to be XML, but XML is a nice, open format.)

数据由节点和子节点组成(深度没有限制),每个节点可以有一些文本.

The data consists of nodes and child nodes (no limit on depth), and every single node can have some text.

我的数据可能如下所示:

My data might look something like this:

<?xml version="1.0" ?>
<nodes>
  <node title="root">
    <node title="child1">
      Here is some text for child1.
    </node>
    <node title="child2">
      Here is some text for child2.
    </node>
    <node title="child3">
      Here is some text for child3.
    </node>
    Here is some text for root.
  </node>
</nodes>

但是这种方法的问题是我最终得到了很多原始文本中没有的空白.例如,我的根节点的文本有 10 个换行符和一堆制表符(或空格),以便很好地格式化子节点.

But the problem with this approach is that I'm ending up with a lot of whitespace that wasn't in the original text. For example, the text for my root node has 10 newlines and a bunch of tabs (or spaces) in order to format the child nodes nicely.

使用 XML 以这种方式存储数据,但又准确保留原始文本而不添加任何额外空白字符的好方法是什么?

What's a good way to use XML to store data this way, but retaining the original text exactly, without adding any additional whitespace characters?

注意:我假设我可以像这样没有换行或缩进的所有数据:

Note: I assume I could just have all the data without newlines or indents like this:

<?xml version="1.0" ?>
<nodes>
  <node title="root"><node title="child1">Here is some text for child1.
</node><node title="child2">Here is some text for child2.
</node><node title="child3">Here is some text for child3.
</node>Here is some text for root.
</node>
</nodes>

我想这消除了任何新的空白.但这是最好的方法吗?它尽可能地丑陋.一些 XML 查看器可能会通过添加空格来格式化标签.

I guess that eliminates any new whitespace. But is that the best way? It's about as ugly as it could be. And some XML viewers might format the tags by adding whitespace.

推荐答案

让我们分别考虑未混合和混合内容:

Let's separately consider unmixed and mixed content:

当元素之间不能混合文本时,只需根据需要管理元素内的空格,并允许 XML 序列化程序和编辑器管理元素之间的空格:

When no text can be mixed between your elements, simply manage whitespace within elements as you wish, and allow XML serializers and editors to manage the whitespace between elements:

<?xml version="1.0" ?>
<nodes>
  <node title="root">
    <node title="child1">Here is some text for child1.</node>
    <node title="child2">Here is some text for child2.</node>
    <node title="child3">Here is some text for child3.</node>
  </node>
</nodes>

这适用于面向数据和面向文档的 XML.(OOXML 是一个不需要混合内容的面向文档的 XML 示例.)

This works fine for both data-oriented and document-oriented XML. (OOXML is an example of document-oriented XML that doesn't need mixed content.)

当元素之间可以混合文本时,请根据数据的语义决定如何管理空格.例如,如果您的数据类似于 HTML,则多个连续空格的含义与单个空格没有区别,因此允许 XML 序列化程序和编辑器管理空格就可以了:

When text can be mixed between your elements, decide how to manage whitespace depending upon the semantics of your data. For example, if your data is like HTML, multiple consecutive space mean nothing different than a single space, so allowing XML serializers and editors to manage the whitespace is fine:

<?xml version="1.0" ?>
<nodes>
  <node title="root">
    <node title="child1">Here is some text for child1. </node>
    <node title="child2">Here is some text for child2. </node>
    <node title="child3">Here is some text for child3. </node>
    Here is some text for root.
  </node>
</nodes>

xml:space

如果您的 XML 的某些部分将重要性与嵌入的空格相关联,您可以通过向包含元素添加一个特殊的 xml:space="preserve" 属性来表示这一点:

2.10 空白空间处理

在编辑 XML 文档时,使用空格"通常很方便(空格、制表符和空行)将标记分开以获得更大的可读性.这种空白通常不用于包含在文档的交付版本中.另一方面,交付时应保留的重要"空白版本很常见,例如在诗歌和源代码中.

In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code.

XML 处理器必须始终传递文档中的所有字符不是通过应用程序进行标记.验证 XML 处理器还必须通知应用程序哪些字符构成元素内容中出现的空白.

An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content.

一个名为 xml:space 的特殊属性可以附加到一个元素上表示在该元素中,空白应该是由应用程序保存.在有效的文档中,这个属性,比如任何其他,如果使用,必须声明.声明时,必须是作为枚举类型给出,其值为以下之一或两者默认"保留".

A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications. In valid documents, this attribute, like any other, must be declared if it is used. When declared, it must be given as an enumerated type whose values are one or both of "default" and "preserve".

不过,您应该小心谨慎地使用 xml:space="preserve".将它放在诸如 OOXML 之类的复杂 XML 格式的根元素上可能会使 您数据的消费者有理由不高兴.

You should take care to use xml:space="preserve" conservatively, however. Placing it on the root element of a complex XML format such as OOXML is likely to make consumers of your data justifiably unhappy.

这篇关于保留 XML 中的文本格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆