如何删除 <和 >在 XML 中,它是 XML 消息的一部分 [英] How to remove < and > in XMLthat is part of the XML message

查看:19
本文介绍了如何删除 <和 >在 XML 中,它是 XML 消息的一部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有如下所示的 XML:

I have XML that look as follows:

<StartTag>
    <MyValueTag>And the value itself contains a < bracket that makes the XML invalid</MyValueTag>
</StartTag>

XML 包含一个<"使 XML 无效的字符.

The XML contains a '<' character that makes the XML invalid.

现在最简单的方法是修复 XML 的来源,但不幸的是我无法控制 XML 的创建.它有这样的消息值是<小于 10"假设为小于".

Now the easiest way is to fix the source of the XML but unfortunately I don't have control over the XML creation. It has messages like " The value is < than 10" suppose to be "less than".

无论如何,我可以如何检查 XML 中的此类内容并转义这些字符吗?

Is there anyway how I can check the XML for things like this and escape those characters it?

我尝试了查看这篇文章,其中那个人表示我们应该使用 JTidy.但是当我尝试它时它不会删除 <:

I tried Looking at this post where the guy indicated that we should use JTidy. But when I tried it it doesn't remove the <:

Tidy tidy = new Tidy();
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setWraplen(Integer.MAX_VALUE);
tidy.setPrintBodyOnly(true);
tidy.setXmlOut(true);
tidy.setSmartIndent(true);
ByteArrayInputStream inputStream = new ByteArrayInputStream(data.getBytes("UTF-8"));
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
tidy.parseDOM(inputStream, outputStream);

推荐答案

XML 无效的事实意味着您将无法使用有效的 XML 解析器来读取和修复它.如果您无法找到编写文件的软件作者来修复错误,那么您将不得不想出一些特定于应用程序的解决方案.

The fact that the XML is invalid means you aren't going to be able to use a valid XML parser to read it and fix it. If you can't get the authors of the software that writes the file to fix the bug, then you will have to come up with some application specific solution.

例如,如果您知道流浪 <char 仅出现在 元素的文本中,如果您知道没有其他元素可以作为 的子元素出现,那么它将是编写识别开始和结束标签并替换任何 < 的程序非常容易.用 &#60;

For example, if you knew that the stray < char only occurs in the text of a <MyValue> element, and if you knew that no other elements could occur as children of <MyValue>, then it would be pretty easy to write a program that recognizes the start and end tags, and replaces any < characters that occur between them with &#60;

当然,如果问题没有那么简单,那么解决方案也不会那么简单;但希望您可以使它比解决 XML 的一般问题更简单.

Of course, if the problem isn't that simple, then the solution won't be that simple; but hopefully, you can make it simpler than solving the general problem for XML.

在您手动"修复了几个文件后,停下来问问自己,我怎么知道 < char 需要转义?"然后编写一个对相同知识进行操作的程序.

After you've fixed a few files "by hand," stop and ask yourself, "How did I know that < char needed to be escaped?" Then write a program that operates on that same knowledge.

这篇关于如何删除 &lt;和 &gt;在 XML 中,它是 XML 消息的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆