在 XML 文件中添加新节点 [英] Add new node in XML file
问题描述
我有一个结构如下的 xml 文件:
I have a xml file with a structure like this:
<?xml version="1.0"?>
<entries>
<entry accente="one">
<list>Word</list>
<sense class="0" value="B">
<definition>
<MorfDef>s. m.</MorfDef>
<RegDef>This <i>text</i> have i node.</RegDef>
<ItalMarker>Text.</ItalMarker>
</definition>
</sense>
</entry>
<entry accente="two">
<list>B n-1</list>
<sense class="0" value="B">
<definition>
<MorfDef>s. m.</MorfDef>
<RegDef>This text doesn't have i atribute.</RegDef>
<ItalMarker>Word.</ItalMarker>
</definition>
</sense>
</entry>
</entries>
我想为 RegDef 元素中的每个单词添加一个新节点,因此结果可能是:
I want to add a new node for each word in the RegDef element, so the result could be:
<?xml version="1.0"?>
<entries>
<entry accente="one">
<list>Word</list>
<sense class="0" value="B">
<definition>
<MorfDef>s. m.</MorfDef>
<RegDef><w lemma="A1">This</w> <i><w lemma="A2">text</w></i> <w lemma="A3">have</w> <w lemma="A4">i</w> <w lemma="A5">node</w> <w lemma="A6">.</w></RegDef>
<ItalMarker>Text.</ItalMarker>
</definition>
</sense>
</entry>
<entry accente="two">
<list>B n-1</list>
<sense class="0" value="B">
<definition>
<MorfDef>s. m.</MorfDef>
<RegDef><w lemma="A7">This</w> <w lemma="A8">text</w> <w lemma="A8">doesn't</w> <w lemma="A10">have</w> <w lemma="A11">i</w> <w lemma="A12">atribute</w> <w lemma="A13">.</w></RegDef>
<ItalMarker>Word.</ItalMarker>
</definition>
</sense>
</entry>
</entries>
如果 RegDef 节点有一个 <i > 节点我想从 < 读取文本i > 节点并写一个 <w > 每个单词的节点.我尝试像下面这样使用 XPath:
If the RegDef node have a < i > node I want to read the text fron the < i > node and write a < w > node for each word. I tried to use XPath like below:
Element rootElement = document.getDocumentElement();
Element element = document.createElement("w");
rootElement.appendChild(element);
但它紧跟在根节点之后.如何为 RegDef 标记中的每个单词编写一个节点,然后向该节点添加一个属性?谢谢.
but it appends right after the root node. How can i write a node for each word in RegDef tag and then add an attribute to that node? Thank you.
推荐答案
您选择了文件
的 根 节点.如果您在该节点上使用 appendChild,您的节点将作为根节点的 last 子节点被附加,这是预期的行为.
You selected the root node of your file <entries>
. If you use appendChild on that node, your node will be appended as the last child of the root node, which is the expected behaviour.
您实际上希望使用 w
元素在 RegDef
节点内包裹 单词,这不是三行那么简单的任务您在示例中包含的代码.
You actually want to wrap words inside the RegDef
node with the w
element, which is not a task as simple as the three lines of code you included in your example.
为此,您需要:
- 选择那个节点(DOM中有很多方法,
document.getElementsByTagName("RegDef")
会给你一个包含所有这些的NodeList
.你也可以使用 XPath. - 对于每个
RegDef
,您需要选择其所有后代文本节点.如果您使用 XPath,则在每个RegDef
的上下文中,诸如.//text()
之类的表达式将为您提供这些节点的列表.每个可能包含一个或多个单词",甚至是空格和换行符. - 您可以通过空格或标点符号或其他可用作词分隔符的字符分割来提取词.Java 中有多种工具可用于此目的,包括正则表达式.
- 最后,当你分离出每个单独的词",并消除了你想忽略的节点时,你可以创建一个
w
元素,为每个元素创建一个包含单词的新 文本节点,并将文本节点作为该元素的子元素追加.您还必须设置属性.
- Select that node (there are many methods in the DOM,
document.getElementsByTagName("RegDef")
will give you aNodeList
containing all of them. You can also use XPath. - For each
RegDef
you will need to select all its descendant text nodes. If you use XPath an expression such as.//text()
in the context of eachRegDef
will give you a list of those nodes. Each one may contain one or more "words", or even empty spaces and newlines. - You can extract the words by splitting by spaces or punctuation marks or other characters that can be used as delimiters for a word. There are several tools for that in Java, including regular expressions.
- Finally, when you have isolated each individual "word", and eliminated the nodes you want to ignore, you can create a
w
element for each one, create a new text node containing the word, and append the text node as a child of that element. You will also have to set attributes.
也许您应该使用较小的 XML 文件来专注于您的特定问题,然后再根据您的实际示例进行调整.你可以从这样的事情开始:
Perhaps you should use a smaller XML file to focus on your specific problem, and later adapt it to your real world example. You could start with something like this:
String xml = "<nodes>\n"
+ " <RegDef>This <i>text</i> have i node.</RegDef>\n"
+ " <RegDef>This text doesn't have i atribute.</RegDef>\n"
+ "</nodes>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
NodeList regDefNodes = document.getElementsByTagName("RegDef");
int size = regDefNodes.getLength();
for(int i = 0; i < size; i++) {
Element regDef = (Element)regDefNodes.item(i);
Element newRegDef = wrapWordsInContents(regDef, document);
Element parent = (Element)regDef.getParentNode();
parent.replaceChild(newRegDef, regDef);
}
现在您可以使用上述步骤作为指导并编写 wrapWordsInContents(Element e, Document doc)
方法.
Now you can use the steps above as a guide and write the wrapWordsInContents(Element e, Document doc)
method.
更新:您询问了对 后续问题,其中包含 wrapWordsInContents(Element e, Document doc)
方法.在调用该方法并使用以下命令序列化上面的代码后:
UPDATE: You asked about tokenizing the content in a followup question which contains the wrapWordsInContents(Element e, Document doc)
method. After you call that method and serialize the code above with:
Transformer t = TransformerFactory.newInstance().newTransformer();
t.transform(new DOMSource(document), new StreamResult(System.out));
您将得到与您期望的结果相似的结果.查看你的后续问题:修改XML标签的文本内容一个>
you will have a result similar to the one you expect. See your followup question: Modify the text content of XML tag
这篇关于在 XML 文件中添加新节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!