在Java中剥离无效的XML字符 [英] Stripping Invalid XML characters in Java

查看:131
本文介绍了在Java中剥离无效的XML字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个XML文件,它是数据库的输出。我正在使用Java SAX解析器来解析XML并以不同的格式输出它。 XML包含一些无效字符,解析器抛出错误,如'无效的Unicode字符(0x5)'

I have an XML file that's the output from a database. I'm using the Java SAX parser to parse the XML and output it in a different format. The XML contains some invalid characters and the parser is throwing errors like 'Invalid Unicode character (0x5)'

除了预先删除所有这些字符之外,还有一种好的方法吗逐行处理文件并替换它们?到目前为止,我遇到了3个不同的无效字符(0x5,0x6和0x7)。这是一个~4gb的数据库转储,我们将要处理它多次,所以每次我们得到一个新的转储来运行预处理器时,不得不再等30分钟,这将是一个痛苦,这不是我第一次遇到这个问题。

Is there a good way to strip all these characters out besides pre-processing the file line-by-line and replacing them? So far I've run into 3 different invalid characters (0x5, 0x6 and 0x7). It's a ~4gb database dump and we're going to be processing it a bunch of times, so having to wait an extra 30 minutes each time we get a new dump to run a pre-processor on it is going to be a pain, and this isn't the first time I've run into this issue.

推荐答案

我没有亲自使用过这个问题,但Atlassian制作了一个可以满足您需求的命令行XML清理程序(它主要用于JIRA,但XML是XML):

I haven't used this personally but Atlassian made a command line XML cleaner that may suit your needs (it was made mainly for JIRA but XML is XML):


下载 atlassian-xml-cleaner-0.1.jar

打开DOS控制台或shell,找到计算机上的XML或ZIP备份文件,此处假设名为data.xml

Open a DOS console or shell, and locate the XML or ZIP backup file on your computer, here assumed to be called data.xml

运行:
java -jar atlassian-xml-cleaner-0.1.jar data.xml> data-clean.xml

Run: java -jar atlassian-xml-cleaner-0.1.jar data.xml > data-clean.xml

这将将data.xml的副本写入data-clean.xml,其中包含无效的charact删除了。

This will write a copy of data.xml to data-clean.xml, with invalid characters removed.

这篇关于在Java中剥离无效的XML字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆