处理垂直制表符和其他无效 xml 字符的最佳实践 [英] Best practice for handling vertical tabs and other invalid xml characters

查看:31
本文介绍了处理垂直制表符和其他无效 xml 字符的最佳实践的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个应用程序(与许多其他应用程序一样)接收用户输入,将其存储在数据库中,然后使用(除其他外)XML 工具对其进行处理.该应用程序接受自由文本输入,与许多其他开发人员一样,我在转义和引用方面非常小心,因此它可以处理包含不同类型的空格、引号字符、保留的 XML 字符等的输入.

I have an application which (like many others) takes in user input, stores it in a database and then later processes it using (amongst other things) XML tools. The application takes in free text input and like many other developers I am very careful with escaping and quoting so it can handle input containing different types of whitespace, quote characters, reserved XML characters etc.

但是,有时用户会设法输入包含垂直制表符(十六进制 0B)或换页符(十六进制 0C)的字符串.这根本无法由 XML 工具处理并导致应用程序崩溃.

However, occasionally a user will manage to enter a string containing a vertical tab character (hex 0B) or a form feed (hex 0C). this cannot be processed by XML tools at all and causes the app to barf.

在我的应用程序中,在往返"过程中保留原始输入非常重要,所以我不愿意删除我不喜欢的任何字符,尤其是诸如表单馈送之类的东西,这些东西仍然偶尔使用纯文本文件.

In my application it's quite important to preserve the original input during the 'round trip' process, so i'm loath to just strip out any characters I don't like, especially things like form feed which are still occasionally used in plain text files.

在涉及 XML 处理时,是否有任何公认的最佳实践或通用策略来处理这些字符?

is there any accepted best practice or general strategy for handling these characters when XML processing is involved?

推荐答案

是的,不幸的是有些字符在 XML 中是非法的,并且没有等效的实体.作为这些示例之一,请参阅:

Yes, unfortunately some characters are illegal in XML, and have no entity equivalent. As one of those examples, see:

http://www.jdom.org/docs/apidocs.1.1/org/jdom/Element.html#setText(java.lang.String)

这是一个字符串设置器......可以抛出异常!垂直制表符正是那些没有 XML 实体的字符之一,也没有办法单独使用 XML 来转义"它.

which is a String setter... that can throw an exception! Vertical tab is exactly one of those characters for which there is no XML entity, nor a way to "escape" it with XML alone.

我自己正在通过使用 base64 编码来清理可能包含这些字符的字符串来解决这个问题.这有点傻,因为我必须一直使用 base64 编码和解码,但我认为没有好的选择.

I'm working around this myself by using base64 encoding to sanitize strings that might harbor those characters. It's a bit silly, since I have to base64-encode and decode all the time, but I don't think there's a good alternative.

这篇关于处理垂直制表符和其他无效 xml 字符的最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆