为什么MarkLogic一定要在XML文档中绝对存储无效字符? [英] Why is MarkLogic Definitely Storing Invalid Characters in XML Documents?

查看:88
本文介绍了为什么MarkLogic一定要在XML文档中绝对存储无效字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现可以在MarkLogic数据库的XML文档中存储无效的XML字符,当我尝试更新文档中的文本时需要引用和取消引用XML数据时,这会引起问题.

I have found that it is possible to store invalid XML characters in XML documents in the MarkLogic database, which causes problems when I try to update the text in a document when it involved needing to quote and unquote the XML data.

我现在有示例代码来证明可以存储无效数据.您可以从查询控制台运行此命令,由于包含数据库中XML的带引号的字符串包含",因此尝试取消对引号字符串的引用时会出现错误.

I now have example code that prove that invalid data can be stored. You can run this from Query Console, and you will get an error when trying to unquote the quotes string, due to the quoted string containing "", which was produced from the XML stored in the database.

let $Doc := <TEST>Here is invalid character 14: {fn:codepoints-to-string((14))}</TEST>
return
  xdmp:document-insert("/Test.xml", $Doc)

;

let $Quoted := xdmp:quote(/TEST)
let $Unquoted := xdmp:unquote($Quoted)
return
  $Unquoted

推荐答案

MarkLogic是文档数据库,而不仅仅是XML数据库,因此即使文档URI具有xml扩展名或正在对现有XML文档进行节点插入,它也不会对要插入的数据做任何假设.

MarkLogic is Document database, not just an XML database, so it makes no assumptions about the data you are inserting, even if the document URI has an xml extension or you are doing a node insert to an existing XML document.

这也意味着它将接受带有无效字符的xml. xdmp:node-insert-child()可以与xml和json一起使用,因此您可以根据摄取的数据清理/验证数据,或处理检索时的异常情况.

This also means that it will accept xml with invalid characters. xdmp:node-insert-child() can be used with both xml, and json so it is up to you to either clean up/validate the data on ingest, or to handle exceptions on retrieval.

模式是可用于文档验证的一种方法.

或者,您可以在文档中明确指定XML版本 :

更改为接受的XML字符集

从MarkLogic 9.0-6开始,使用XML解析XML文档 明确指定XML版本1.1的声明(版本="1.1") 强制使用XML 1.1字符集.因此,您现在可以创建 内容包含XML 1.0不能接受的字符.

As of MarkLogic 9.0-6, parsing of XML documents with an XML declaration that explicitly specifies XML version 1.1 (version="1.1") enforces the XML 1.1 character set. Consequently, you can now create content containing characters not accepted by XML 1.0.

XML 1.1受限字符范围内的字符必须指定为 角色实体.此实施适用于以下情况 字符范围:

Characters in the XML 1.1 restricted character ranges must be given as character entities. This enforcement applies to the following character ranges:

0x1-0x8 0xB-0xC 0xE-0x1F 0x7F-0x84 0x86-0x9F以下字符 现在可以接受以前不允许的范围.

0x1-0x8 0xB-0xC 0xE-0x1F 0x7F-0x84 0x86-0x9F The following character ranges that were previously disallowed are now accepted.

0x1-0x8 0xB-0xC 0xE-0x1F

0x1-0x8 0xB-0xC 0xE-0x1F

这篇关于为什么MarkLogic一定要在XML文档中绝对存储无效字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆