在 Java 中解析没有结束标记的 XML [英] Parsing XML with no closing tags in Java

查看:55
本文介绍了在 Java 中解析没有结束标记的 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在解析没有结束标记的 XML 时遇到问题.请参阅下面的 xml 片段.

I am having trouble parsing an XML with no closing tag. Please see snippet of the xml below.

我尝试过 SAX 和 StAX 解析器,它们都需要一个格式正确的 XML 和结束标记 XXYY....正如您在下面看到的,XML 格式有点不同...如果有任何 API,请帮助我那里可以帮助我解析这个或者如果 SAX/StAX 可以帮助我实现我想要的...... :(

I have tried SAX and also StAX Parser they both need a properly formatted XML with closing tag XXYY....as you can see below the XML format is a little bit different... Please help me if there is any API out there that can help me parse this or if SAX/StAX can help me achieve what I want.... :(

<Employees>
 <Employee>
  <Detail>
    <Date>2018014
    <Name>XXYY
    <Age>0
    <LANGUAGE>ENG
    <Manager>
    <MName>YYXX
    <MID>5959
    </Manager>
    <EmployeeID>1234
  </Detail>
 </Employee>
</Employees>

推荐答案

您可以通过添加所有缺失的结束标记来修复" XML.

You could "fix" the XML by adding all the missing end-tags.

任何在同一行的标签后包含文本的开始标签都可以通过在行尾添加结束标签来修复.

Any start-tag that contains text after the tag, on the same line, could be fixed by adding an end-tag at the end of the line.

包含文本"的规则确保例如<Manager> 标签没有结束,因为它实际上是向下 3 行结束.

The rule of "contains text" ensures that e.g. the <Manager> tag doesn't get ended, since that is actually ended 3 lines down.

示例工作代码:

// Load file into memory
String xml = new String(Files.readAllBytes(Paths.get("test.xml")), StandardCharsets.UTF_8);

// Apply magic to add missing end-tags
xml = xml.replaceAll("(?m)^(\\s*)<(\\w+)>([^<]+)$", "$1<$2>$3</$2>");

// Parse then print the XML, to ensure there are no errors
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                                          .parse(new InputSource(new StringReader(xml)));
TransformerFactory.newInstance().newTransformer()
                  .transform(new DOMSource(document), new StreamResult(System.out));

这篇关于在 Java 中解析没有结束标记的 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆