为什么无效字符会进入MarkLogic数据库? [英] Why Do Invalid Characters Get Into MarkLogic Database?

查看:83
本文介绍了为什么无效字符会进入MarkLogic数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现可以将无效的XML字符插入MarkLogic数据库.仅当我碰巧提取xdmp:quote然后提取xdmp:unquote XML文档时,这种情况才变得明显,然后我收到诸如无效字符实体'14'"之类的消息.

I have discovered that it is possible to insert invalid XML characters into a MarkLogic database. This only becomes apparent if I happen to extract, xdmp:quote then later xdmp:unquote an XML document, whereupon I get a message such as "Invalid character entity '14'".

该字符通过XQuery生成的HTML表单提交进入数据库.我认为用户从Excel中粘贴了文本,其中包括此类隐藏的鼻涕.

The character got into the database via an XQuery-generated HTML form submission. I think the user pasted text in from Excel, which includes such hidden nasties.

很明显,我将来需要检查输入的内容,但是可以肯定的是,这个错误应该得到解决.如果这些字符是非法的,那么在将数据保存到数据库时,MarkLogic为什么不将其删除呢?

Clearly I am going to need to check what is being input in future, but surely this is abug that should be fixed. If the characters are illegal, why isnt MarkLogic stripping them out when saving data to the database?

尼尔.

推荐答案

MarkLogic在内存中和持久化XML文档时都使用XML的解析表示形式.无效字符会导致解析失败,从而阻止MarkLogic将文档存储为XML.

MarkLogic uses a parsed representation for XML both in memory and when persisting an XML document. Invalid characters would cause parse failures, preventing MarkLogic from storing a document as XML.

但是,MarkLogic可以将无效的XML序列化存储为文本或二进制文档.字节对于XML可能无效,但对于文本或二进制无效.

However, MarkLogic can store an invalid serialization of XML as a text or binary document. The bytes may be invalid for XML, but they aren't invalid for text or binary.

HTML表单提交是否有可能以文本或二进制而不是XML形式提交文档? xdmp:node-kind()fn:doc()检索时,有关表单提交和文档的报告如何?

Is it possible that the HTML form submission submits the documents as text or binary instead of as XML? What does xdmp:node-kind() report about the form submission and about the document when retrieved with fn:doc()?

帮助调查的希望,

这篇关于为什么无效字符会进入MarkLogic数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆