如何让 Matlab 读取正确数量的 xml 节点 [英] How to get Matlab to read correct amount of xml nodes

查看:25
本文介绍了如何让 Matlab 读取正确数量的 xml 节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 matlab 的 xmlread 内部函数读取一个简单的 xml 文件.

I'm reading a simple xml file using matlab's xmlread internal function.

<root>
    <ref>
        <requestor>John Doe</requestor>
        <project>X</project>
    </ref>
</root>

但是当我调用 ref 元素的 getChildren() 时,它告诉我它有 5 个子元素.

But when I call getChildren() of the ref element, it's telling me that it has 5 children.

如果我将所有 XML 放在 一行 中,它可以正常工作.Matlab 告诉我 ref 元素有 2 个子元素.

It works fine IF I put all the XML in ONE line. Matlab tells me that ref element has 2 children.

它似乎不喜欢元素之间的空格.

It doesn't seem to like the spaces between elements.

即使我在 oXygen XML 编辑器中运行 Canonicalize,我仍然得到相同的结果.因为 Canonicalize 仍然会留下空格.

Even if I run Canonicalize in oXygen XML editor, I still get the same results. Because Canonicalize still leaves spaces.

Matlab 使用 java 和 xerces 处理 xml 内容.

Matlab uses java and xerces for xml stuff.

我该怎么做才能使我的 xml 文件保持人类可读的格式(不是全部在一行中)但仍然让 matlab 正确解析它?

What can I do so that I can keep my xml file in human readable format (not all in one line) but still have matlab correctly parse it?

filename='example01.xml';
docNode = xmlread(filename);
rootNode = docNode.getDocumentElement;
entries = rootNode.getChildNodes;
nEnt = entries.getLength

推荐答案

幕后的 XML 解析器正在为节点元素之间的所有空白创建 #text 节点.无论哪里有换行符或缩进,它都会在节点的数据部分创建一个带有换行符和缩进空格的#text 节点.因此,在您提供的 xml 示例中,当它解析ref"元素的子节点时,它返回 5 个节点

The XML parser behind the scenes is creating #text nodes for all whitespace between the node elements. Whereever there is a newline or indentation it will create a #text node with the newline and following indentation spaces in the data portion of the node. So in the xml example you provided when it is parsing the child nodes of the "ref" element it returns 5 nodes

  1. 节点 1:#text 带有换行符和缩进空格
  2. 节点 2:请求者"节点,该节点又在数据部分有一个带有John Doe"的#text 子节点
  3. 节点 3:#text 带有换行符和缩进空格
  4. 节点 4:项目"节点,该节点在数据部分有一个带有X"的#text 子节点
  5. 节点 5:#text 带有换行符和缩进空格

此功能会为您删除所有这些无用的#text 节点.请注意,如果您故意让 xml 元素仅由空格组成,则此函数将删除它,但对于 99.99% 的 xml 情况,这应该可以正常工作.

This function removes all of these useless #text nodes for you. Note that if you intentionally have an xml element composed of nothing but whitespace then this function will remove it but for the 99.99% of xml cases this should work just fine.

function removeIndentNodes( childNodes )

numNodes = childNodes.getLength;
remList = [];
for i = numNodes:-1:1
   theChild = childNodes.item(i-1);
   if (theChild.hasChildNodes)
      removeIndentNodes(theChild.getChildNodes);
   else
      if ( theChild.getNodeType == theChild.TEXT_NODE && ...
           ~isempty(char(theChild.getData()))         && ...
           all(isspace(char(theChild.getData()))))
         remList(end+1) = i-1; % java indexing
      end
   end
end
for i = 1:length(remList)
   childNodes.removeChild(childNodes.item(remList(i)));
end

end

这样称呼

tree = xmlread( xmlfile );
removeIndentNodes( tree.getChildNodes );

这篇关于如何让 Matlab 读取正确数量的 xml 节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆