使用 XML 模式修复 Java 中的 XML [英] Using a XML schema to fix an XML in Java

查看:22
本文介绍了使用 XML 模式修复 Java 中的 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有谁知道一种工具可以让我在 Java 中获取 XML 字符串,根据模式检查它,并在格式错误时修复它?
例如,给定以下架构和 xml 代码

Does anyone know of a tool that would allow me to take an XML string in Java, check it against a schema, and fix it if it is malformed?
For example, given the following schema and xml code

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">

  <xs:element name="tag">
   <xs:element name="subtag" type="xs:token" />
  </xs:element>
</xs:schema>


<tag>
<subtag>content
</tag>

我正在寻找一种工具,它可以读取架构、解析 XML、注意缺失的标记并添加它.出于这个特定程序的目的,除了缺少标签之外,我不需要任何更正.(顺便说一句,一个可以在不使用模式的情况下定位和添加缺失标签的工具也很好).
有什么建议?

I am looking for a tool that can read the schema, parse the XML, notice the missing tag, and add it. For purposes of this particular program, I don't need any correction other than missing tags. (btw, a tool that can locate and add missing tags without using the schema is fine also).
Any suggestions?

推荐答案

问题当然是,对于任何不符合模式的实例,都有无数的相似"实例符合模式,而您的挑战是选择在某些方面最相似"的那个.

The trouble is, of course, that for any instance that doesn't conform to the schema, there are an infinite number of "similar" instances that do conform to the schema, and your challenge is to choose the one that is "most similar" on some measure.

HTML5 试图通过一套精心设计的规则来做到这一点.这些规则包含许多特定模式的知识,例如,如果发现 tr 作为表的子项,则 tr 被包装在 tbody 中.您可以尝试为您的架构/词汇表做同样的事情,但要做好大量工作的准备.

HTML5 tries to do this, with an elaborate set of rules. These rules contain a lot of knowledge of the specific schema, for example if a tr is found as a child of a table then the tr is wrapped in a tbody. You could try to do the same for your schema/vocabulary, but be prepared for a lot of work.

对任意模式做同样的事情听起来像是一个有趣的博士项目.成功地做到这一点可能需要对模式偏差的原因进行一些研究(就像拼写纠正应该考虑输入是由用户输入的、通过语音识别获得的还是使用 OCR 扫描获得的——每一种都引入了不同种类的错误.)

Doing the same thing for an arbitrary schema sounds like an interesting PhD project. Doing it successfully would probably require some research into the causes of deviations from the schema (just as spelling correction should take into account whether the input was typed by the user, obtained by voice recognition, or obtained using OCR scanning - each introduces different kinds of errors.)

这篇关于使用 XML 模式修复 Java 中的 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆