在Java中的字符串xml节点中转义xml字符 [英] Escape xml characters within nodes of string xml in java
问题描述
我有一串XML数据。我需要对节点内的值进行转义,而不对节点本身进行转义。
I have a string of XML data. I need to escape the values within the nodes, but not the nodes themselves.
Ex:
< node1> R& R< / node1>
应该转义至:
< node1> R& R< / node1> ;
不应转义至:
& lt; node1& R& R& lt // node1& gt;
最近几天我一直在为此进行工作,但是并没有取得太大的成功。我不是Java专家,但是以下是我尝试过的不起作用的事情:
I have been working on this for the last couple of days, but haven't had much success. I'm not an expert with Java, but the following are things that I have tried that will not work:
- 将字符串xml解析为文件。由于节点内的数据包含无效的xml数据,因此无法正常工作。
- 转义所有字符。由于接收到该数据的程序将无法接受这种格式,因此无法正常工作。
- 转义所有字符,然后解析为文档。抛出各种错误。
任何帮助将不胜感激。
推荐答案
您可以使用正则表达式匹配来查找尖括号之间的所有字符串,并循环遍历/处理每个字符串。在此示例中,我使用了 Apache Commons Lang 进行XML转义。 / p>
You could use regular expression matching to find all the strings between angled brackets, and loop through/process each of those. In this example I've used the Apache Commons Lang to do the XML escaping.
public String sanitiseXml(String xml)
{
// Match the pattern <something>text</something>
Pattern xmlCleanerPattern = Pattern.compile("(<[^/<>]*>)([^<>]*)(</[^<>]*>)");
StringBuilder xmlStringBuilder = new StringBuilder();
Matcher matcher = xmlCleanerPattern.matcher(xml);
int lastEnd = 0;
while (matcher.find())
{
// Include any non-matching text between this result and the previous result
if (matcher.start() > lastEnd) {
xmlStringBuilder.append(xml.substring(lastEnd, matcher.start()));
}
lastEnd = matcher.end();
// Sanitise the characters inside the tags and append the sanitised version
String cleanText = StringEscapeUtils.escapeXml10(matcher.group(2));
xmlStringBuilder.append(matcher.group(1)).append(cleanText).append(matcher.group(3));
}
// Include any leftover text after the last result
xmlStringBuilder.append(xml.substring(lastEnd));
return xmlStringBuilder.toString();
}
这会查找< something> text< / something>的匹配项,并捕获标签名称和包含的文本,对包含的文本进行消毒,然后将其放回原处。
This looks for matches of <something>text</something>, captures the tag names and contained text, sanitises the contained text, and then puts it back together.
这篇关于在Java中的字符串xml节点中转义xml字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!