Java XML解析器添加了不必要的xmlns和xml：space属性 [英] Java XML parser adding unnecessary xmlns and xml:space attributes

查看：194 发布时间：2020/10/27 0:21:44 java xml xml-parsing xml-namespaces dtd

本文介绍了Java XML解析器添加了不必要的xmlns和xml：space属性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Windows 10上使用Java 11（AdoptOpenJDK 11.0.5，2019年10月15日），正在解析一些旧的XHTML 1.1文件，这些文件具有以下一般形式：

I'm using Java 11 (AdoptOpenJDK 11.0.5 2019-10-15) on Windows 10. I'm parsing some legacy XHTML 1.1 files, which take the following general form:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" http://www.w3.org/MarkUp/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
  <title>XHTML 1.1 Skeleton</title>
</head>
<body>
</body>
</html>

我正在使用一个简单的非验证解析器：

I'm using a simple non-validating parser:

DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
final Document document;
try (InputStream inputStream = new BufferedInputStream(getClass().getResourceAsStream("xhtml-1.1-test.xhtml"))) {
  document = documentBuilder.parse(inputStream);
}

出于某种原因，它添加了额外的属性，例如 xmlns：xsi = htt p：//www.w3.org/2001/XMLSchema-instance 和 xml：space = preserve 到处都是：

For some reason it's adding extra attributes such as xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" and xml:space="preserve" all over the place:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" version="-//W3C//DTD XHTML 1.1//EN" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="en">
<head xmlns="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <title xmlns="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">XHTML 1.1 Skeleton</title>
</head>
<body xmlns="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:space="preserve"></body>
</html>

我知道DTD可以提供默认属性值，但我不理解为什么<$ c当该名称空间中似乎没有元素或属性时，添加了$ c> xmlns：xsi 属性。

I know that DTDs can provide default attribute values, but I don't understand why the xmlns:xsi attribute was added, when there appear to be no elements or attributes in that namespace.

此外 xml：space = preserve 似乎完全不正确；我认为，只有像< pre> 这样的元素才应设置 xml：space = preserve 。（更新： HTML5规范表示HTML默认情况下会保留空间，并且 xml：space 不得在HTML中进行序列化，因此这可能是这里的部分原因。我将改进HTML序列化程序忽略 xml：space 属性，这将部分缓解此问题。）

Furthermore xml:space="preserve" seems to be incorrect altogether; only elements like <pre> should have xml:space="preserve" set, I would think. (Update: The HTML5 specification indicates that HTML by default preserves space, and that xml:space must not be serialized in HTML, so maybe that was part of the reasoning here. I will improve my HTML serializer to ignore the xml:space attribute, which will partially mitigate this issue.)

还请注意 version =-/// W3C // DTD XHTML 1.1 // EN ；那是我不需要或想要的东西。

Also note the version="-//W3C//DTD XHTML 1.1//EN" as well; that's something I don't need or want.

我在做错什么吗？

有趣的是，对于严格的XHTML 1.0来说，这不是问题。

Interestingly this is not a problem with XHTML 1.0 strict.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>XHTML 1.0 Skeleton</title>
</head>
<body>
</body>
</html>

解析后得出的结果是：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>XHTML 1.0 Skeleton</title>
</head>
<body>
</body>
</html>

但这是-// W3C // DTD XHTML 1.1的问题加上MathML 2.0加上SVG 1.1 // EN 。因此，这似乎只是XHTML 1.1的问题。

But it is a problem with -//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN. So this seems to be just an XHTML 1.1 problem.

更新：我有一些潜在的帮助消息：如果我创建的新文档中没有DTD并将整个文档树导入到新文档中，所有这些杂项（显然来自DTD中的隐含属性）消失了，因为目标文档根本没有DTD。（请参阅如何从Java XML DOM的DTD中强制使用具有隐含默认值的属性删除）。

Update: I have some potentially helpful news: if I create a new document without a DTD and import the entire document tree into the new document, all this cruft (which apparently comes from implied attributes in the DTD) goes away, because the destination document doesn't have a DTD at all. (See How to force removal of attributes with implied default values from DTD in Java XML DOM .) But this is very inefficient; it would be nice to turn this off altogether when parsing.

Java XML解析器添加了不必要的xmlns和xml：space属性 [英] Java XML parser adding unnecessary xmlns and xml:space attributes

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java XML解析器添加了不必要的xmlns和xml：space属性 [英] Java XML parser adding unnecessary xmlns and xml:space attributes

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭