具有显式默认命名空间的 XML 文档的 XPath 和命名空间规范 [英] XPath and namespace specification for XML documents with an explicit default namespace

查看:18
本文介绍了具有显式默认命名空间的 XML 文档的 XPath 和命名空间规范的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力获得包 XML (argument namespaces) 对于具有显式 xmlns 的 XML 文档命名空间在顶部元素中定义.

I'm struggling to get the correct combination of an XPath expression and the namespace specification as required by package XML (argument namespaces) for a XML document that has an explicit xmlns namespace defined at the top element.

多亏了 har07,我才能把它放在一起:

Thanks to har07 I was able to put it together:

查询命名空间后,ns 的第一个条目还没有名称,这就是问题所在:

Once you query the namespaces, the first entry of ns has no name yet and that's the problem:

nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))

> ns
                                             omegahat                          r 
    "http://something.org"  "http://www.omegahat.org" "http://www.r-project.org" 

所以我们只需分配一个名称作为前缀(这可以是任何有效的 R 名称):

So we'll just assign a name that serves as a prefix (this can be any valid R name):

names(ns)[1] <- "xmlns"

现在我们要做的就是在我们的 XPath 表达式中使用默认的命名空间前缀 everywhere:

Now all we have to do is using that default namespace prefix everywhere in our XPath expressions:

getNodeSet(doc, "/xmlns:doc//xmlns:b[@omegahat:status='foo']", ns)

对于那些对基于 name()namespace-uri()(以及其他)的替代解决方案感兴趣的人可能会发现 这篇文章很有帮助.

For those interested in alternative solutions based on name() and namespace-uri() (amongst others) might find this post helpful.

仅供参考:这是我们找到解决方案之前的试错代码:

Just for the sake of reference: this was the trial-and-error code before we came to the solution:

考虑 ?xmlParse 中的示例:

require("XML")

doc <- xmlParse(system.file("exampleData", "tagnames.xml", package = "XML"))

> doc
<?xml version="1.0"?>
<doc>
  <!-- A comment -->
  <a xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">
    <b>
      <c>
        <b/>
      </c>
    </b>
    <b omegahat:status="foo">
      <r:d>
        <a status="xyz"/>
        <a/>
        <a status="1"/>
      </r:d>
    </b>
  </a>
</doc>
nsDefs <- xmlNamespaceDefinitions(getNodeSet(doc, "/doc/a")[[1]])
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))
getNodeSet(doc, "/doc//b[@omegahat:status='foo']", ns)[[1]]

然而,在我的文档中,命名空间已经在 <doc> 标记中定义,因此我相应地修改了示例 XML 代码:

In my document, however, the namespaces are already defined in <doc> tag, so I adapted the example XML code accordingly:

xml_source <- c(
  "<?xml version="1.0"?>",
  "<doc xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">",
  "<!-- A comment -->",
  "<a>",
  "<b>",
  "<c>",
  "<b/>",
  "</c>",
  "</b>",
  "<b omegahat:status="foo">",
  "<r:d>",
  "<a status="xyz"/>",
  "<a/>",
  "<a status="1"/>",
  "</r:d>",
  "</b>",
  "</a>",
  "</doc>"
)
write(xml_source, file="exampleData_2.xml")  
doc <- xmlParse("exampleData_2.xml")
nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))    
getNodeSet(doc, "/doc", namespaces = ns)
getNodeSet(doc, "/doc//b[@omegahat:status='foo']", namespaces = ns)[[1]]  

一切仍然正常.更重要的是,我的 XML 代码还明确定义了默认命名空间 (xmlns):

Everything still works fine. What's more, though, is that my XML code additionally has an explicit definition of the default namespace (xmlns):

xml_source <- c(
  "<?xml version="1.0"?>",
  "<doc xmlns="http://something.org" xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">",
  "<!-- A comment -->",
  "<a>",
  "<b>",
  "<c>",
  "<b/>",
  "</c>",
  "</b>",
  "<b omegahat:status="foo">",
  "<r:d>",
  "<a status="xyz"/>",
  "<a/>",
  "<a status="1"/>",
  "</r:d>",
  "</b>",
  "</a>",
  "</doc>"  
)
write(xml_source, file="exampleData_3.xml")  
doc <- xmlParse("exampleData_3.xml")
nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))

以前有效的方法现在失败了:

What used to work fails now:

> getNodeSet(doc, "/doc", namespaces = ns)
list()
attr(,"class")
[1] "XMLNodeSet"
Warning message:
using http://something.org as prefix for default namespace http://something.org 

> getNodeSet(doc, "/xmlns:doc", namespaces = ns)
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces,  : 
  error evaluating xpath expression /xmlns:doc
In addition: Warning message:
using http://something.org as prefix for default namespace http://something.org 
getNodeSet(doc, "/xmlns:doc", 
  namespaces = matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs)
)

这似乎让我更接近:

> getNodeSet(doc, "/xmlns:doc",
+ namespaces = matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs)
+ )[[1]]
<doc xmlns="http://something.org" xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">
  <!-- A comment -->
  <a>
    <b>
      <c>
        <b/>
      </c>
    </b>
    <b omegahat:status="foo">
      <r:d>
        <a status="xyz"/>
        <a/>
        <a status="1"/>
      </r:d>
    </b>
  </a>
</doc> 

attr(,"class")
[1] "XMLNodeSet"

然而,现在我不知道如何继续才能到达子节点:

Yet, now I don't know how to proceed in order to get to the children nodes:

> getNodeSet(doc, "/xmlns:doc//b[@omegahat:status='foo']", ns)[[1]]
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces,  : 
  error evaluating xpath expression /xmlns:doc//b[@omegahat:status='foo']
In addition: Warning message:
using http://something.org as prefix for default namespace http://something.org 

> getNodeSet(doc, "/xmlns:doc//b[@omegahat:status='foo']",
+ namespaces = c(
+ matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs),
+ matchNamespaces(doc, namespaces="omegahat", nsDefs = nsDefs)
+ )
+ )
list()
attr(,"class")
[1] "XMLNodeSet"

推荐答案

没有前缀(xmlns="...")的命名空间定义是默认命名空间.如果 XML 文档具有默认命名空间,则在上述默认命名空间中考虑声明了默认命名空间的元素及其所有没有前缀且没有不同默认命名空间声明的后代.

Namespace definition without prefix (xmlns="...") is default namespace. In case of XML document having default namespace, the element where default namespace declared and all of it's descendant without prefix and without different default namespace declaration are considered in that aforementioned default namespace.

因此,在您的情况下,您需要在 XPath 中所有元素的开头使用为默认命名空间注册的前缀,例如:

Therefore, in your case you need to use prefix registered for default namespace at the beginning of all elements in the XPath, for example :

/xmlns:doc//xmlns:b[@omegahat:status='foo']

更新:

实际上我不是 r 的用户,但是在网上查看一些参考资料可能会起作用:

Actually I'm not a user of r, but looking at some references on net something like this may work :

getNodeSet(doc, "/ns:doc//ns:b[@omegahat:status='foo']", c(ns="http://something.org"))

这篇关于具有显式默认命名空间的 XML 文档的 XPath 和命名空间规范的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆