如何让Matlab读取正确数量的xml节点 [英] How to get Matlab to read correct amount of xml nodes

查看:580
本文介绍了如何让Matlab读取正确数量的xml节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用matlab的xmlread内部函数读取一个简单的xml文件.

I'm reading a simple xml file using matlab's xmlread internal function.

<root>
    <ref>
        <requestor>John Doe</requestor>
        <project>X</project>
    </ref>
</root>

但是当我调用ref元素的getChildren()时,它告诉我它有 5 个孩子.

But when I call getChildren() of the ref element, it's telling me that it has 5 children.

如果运行正常,效果很好,我将所有XML放在一行中. Matlab告诉我ref元素有 2 个孩子.

It works fine IF I put all the XML in ONE line. Matlab tells me that ref element has 2 children.

似乎不喜欢元素之间的空格.

It doesn't seem to like the spaces between elements.

即使我在oXygen XML编辑器中运行 Canonicalize ,我仍然可以获得相同的结果.因为Canonicalize仍然留有空格.

Even if I run Canonicalize in oXygen XML editor, I still get the same results. Because Canonicalize still leaves spaces.

Matlab将Java和xerces用于xml.

Matlab uses java and xerces for xml stuff.

我该怎么做才能使xml文件保持人类可读格式(不是全部都在一行中),但仍然可以使matlab正确解析它?

What can I do so that I can keep my xml file in human readable format (not all in one line) but still have matlab correctly parse it?

filename='example01.xml';
docNode = xmlread(filename);
rootNode = docNode.getDocumentElement;
entries = rootNode.getChildNodes;
nEnt = entries.getLength

推荐答案

幕后的XML解析器正在为节点元素之间的所有空白创建#text节点.凡存在换行符或缩进的地方,都将创建一个带有换行符的#text节点,并在该节点的数据部分中跟随缩进空格.因此,在您提供的xml示例中,当解析"ref"元素的子节点时,它将返回5个节点

The XML parser behind the scenes is creating #text nodes for all whitespace between the node elements. Whereever there is a newline or indentation it will create a #text node with the newline and following indentation spaces in the data portion of the node. So in the xml example you provided when it is parsing the child nodes of the "ref" element it returns 5 nodes

  1. 节点1:#带有换行符和缩进空格的文本
  2. 节点2:请求者"节点,该节点又有一个#text子节点,数据部分带有"John Doe"
  3. 节点3:#带有换行符和缩进空格的文本
  4. 节点4:项目"节点,该节点又在数据部分中包含一个带有"X"的#text子对象
  5. 节点5:#text带有换行符和缩进空格

此功能为您删除了所有这些无用的#text节点.请注意,如果您有意让一个由空格组成的xml元素,则此函数将其删除,但对于99.99%的xml情况,这应该可以正常工作.

This function removes all of these useless #text nodes for you. Note that if you intentionally have an xml element composed of nothing but whitespace then this function will remove it but for the 99.99% of xml cases this should work just fine.

function removeIndentNodes( childNodes )

numNodes = childNodes.getLength;
remList = [];
for i = numNodes:-1:1
   theChild = childNodes.item(i-1);
   if (theChild.hasChildNodes)
      removeIndentNodes(theChild.getChildNodes);
   else
      if ( theChild.getNodeType == theChild.TEXT_NODE && ...
           ~isempty(char(theChild.getData()))         && ...
           all(isspace(char(theChild.getData()))))
         remList(end+1) = i-1; % java indexing
      end
   end
end
for i = 1:length(remList)
   childNodes.removeChild(childNodes.item(remList(i)));
end

end

这样称呼

tree = xmlread( xmlfile );
removeIndentNodes( tree.getChildNodes );

这篇关于如何让Matlab读取正确数量的xml节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆