具有显式默认名称空间的XML文档的XPath和名称空间规范 [英] XPath and namespace specification for XML documents with an explicit default namespace

查看:105
本文介绍了具有显式默认名称空间的XML文档的XPath和名称空间规范的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力获取软件包 XML (参数namespaces)用于在顶部元素中定义了显式xmlns命名空间的XML文档.

I'm struggling to get the correct combination of an XPath expression and the namespace specification as required by package XML (argument namespaces) for a XML document that has an explicit xmlns namespace defined at the top element.

感谢har07,我能够将它放在一起:

Thanks to har07 I was able to put it together:

查询名称空间后,ns的第一个条目还没有名称,这就是问题所在:

Once you query the namespaces, the first entry of ns has no name yet and that's the problem:

nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))

> ns
                                             omegahat                          r 
    "http://something.org"  "http://www.omegahat.org" "http://www.r-project.org" 

因此,我们仅分配一个用作前缀的名称(可以是任何有效的R名称):

So we'll just assign a name that serves as a prefix (this can be any valid R name):

names(ns)[1] <- "xmlns"

现在我们要做的就是在我们的XPath表达式中使用默认的名称空间前缀 Everywhere :

Now all we have to do is using that default namespace prefix everywhere in our XPath expressions:

getNodeSet(doc, "/xmlns:doc//xmlns:b[@omegahat:status='foo']", ns)

对于那些对基于name()namespace-uri()的替代解决方案感兴趣的人(以及其他人)可能会发现

For those interested in alternative solutions based on name() and namespace-uri() (amongst others) might find this post helpful.

仅供参考:这是我们提出解决方案之前的反复试验代码:

Just for the sake of reference: this was the trial-and-error code before we came to the solution:

考虑?xmlParse中的示例:

require("XML")

doc <- xmlParse(system.file("exampleData", "tagnames.xml", package = "XML"))

> doc
<?xml version="1.0"?>
<doc>
  <!-- A comment -->
  <a xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">
    <b>
      <c>
        <b/>
      </c>
    </b>
    <b omegahat:status="foo">
      <r:d>
        <a status="xyz"/>
        <a/>
        <a status="1"/>
      </r:d>
    </b>
  </a>
</doc>
nsDefs <- xmlNamespaceDefinitions(getNodeSet(doc, "/doc/a")[[1]])
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))
getNodeSet(doc, "/doc//b[@omegahat:status='foo']", ns)[[1]]

但是,在我的文档中,名称空间已经在<doc>标记中定义,因此我相应地修改了示例XML代码:

In my document, however, the namespaces are already defined in <doc> tag, so I adapted the example XML code accordingly:

xml_source <- c(
  "<?xml version=\"1.0\"?>",
  "<doc xmlns:omegahat=\"http://www.omegahat.org\" xmlns:r=\"http://www.r-project.org\">",
  "<!-- A comment -->",
  "<a>",
  "<b>",
  "<c>",
  "<b/>",
  "</c>",
  "</b>",
  "<b omegahat:status=\"foo\">",
  "<r:d>",
  "<a status=\"xyz\"/>",
  "<a/>",
  "<a status=\"1\"/>",
  "</r:d>",
  "</b>",
  "</a>",
  "</doc>"
)
write(xml_source, file="exampleData_2.xml")  
doc <- xmlParse("exampleData_2.xml")
nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))    
getNodeSet(doc, "/doc", namespaces = ns)
getNodeSet(doc, "/doc//b[@omegahat:status='foo']", namespaces = ns)[[1]]  

一切仍然正常.但是,更重要的是,我的XML代码还具有默认名称空间(xmlns)的显式定义:

Everything still works fine. What's more, though, is that my XML code additionally has an explicit definition of the default namespace (xmlns):

xml_source <- c(
  "<?xml version=\"1.0\"?>",
  "<doc xmlns=\"http://something.org\" xmlns:omegahat=\"http://www.omegahat.org\" xmlns:r=\"http://www.r-project.org\">",
  "<!-- A comment -->",
  "<a>",
  "<b>",
  "<c>",
  "<b/>",
  "</c>",
  "</b>",
  "<b omegahat:status=\"foo\">",
  "<r:d>",
  "<a status=\"xyz\"/>",
  "<a/>",
  "<a status=\"1\"/>",
  "</r:d>",
  "</b>",
  "</a>",
  "</doc>"  
)
write(xml_source, file="exampleData_3.xml")  
doc <- xmlParse("exampleData_3.xml")
nsDefs <- xmlNamespaceDefinitions(doc)
ns <- structure(sapply(nsDefs, function(x) x$uri), names = names(nsDefs))

以前可以工作的东西现在失败了:

What used to work fails now:

> getNodeSet(doc, "/doc", namespaces = ns)
list()
attr(,"class")
[1] "XMLNodeSet"
Warning message:
using http://something.org as prefix for default namespace http://something.org 

> getNodeSet(doc, "/xmlns:doc", namespaces = ns)
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces,  : 
  error evaluating xpath expression /xmlns:doc
In addition: Warning message:
using http://something.org as prefix for default namespace http://something.org 
getNodeSet(doc, "/xmlns:doc", 
  namespaces = matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs)
)

这似乎让我更加接近:

> getNodeSet(doc, "/xmlns:doc",
+ namespaces = matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs)
+ )[[1]]
<doc xmlns="http://something.org" xmlns:omegahat="http://www.omegahat.org" xmlns:r="http://www.r-project.org">
  <!-- A comment -->
  <a>
    <b>
      <c>
        <b/>
      </c>
    </b>
    <b omegahat:status="foo">
      <r:d>
        <a status="xyz"/>
        <a/>
        <a status="1"/>
      </r:d>
    </b>
  </a>
</doc> 

attr(,"class")
[1] "XMLNodeSet"

但是,现在我不知道如何继续操作以到达子节点:

Yet, now I don't know how to proceed in order to get to the children nodes:

> getNodeSet(doc, "/xmlns:doc//b[@omegahat:status='foo']", ns)[[1]]
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Error in xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces,  : 
  error evaluating xpath expression /xmlns:doc//b[@omegahat:status='foo']
In addition: Warning message:
using http://something.org as prefix for default namespace http://something.org 

> getNodeSet(doc, "/xmlns:doc//b[@omegahat:status='foo']",
+ namespaces = c(
+ matchNamespaces(doc, namespaces="xmlns", nsDefs = nsDefs),
+ matchNamespaces(doc, namespaces="omegahat", nsDefs = nsDefs)
+ )
+ )
list()
attr(,"class")
[1] "XMLNodeSet"

推荐答案

不带前缀(xmlns="...")的命名空间定义是默认命名空间.如果XML文档具有默认名称空间,则在上述默认名称空间中考虑声明了默认名称空间的元素及其所有没有前缀且没有不同默认名称空间声明的后代.

Namespace definition without prefix (xmlns="...") is default namespace. In case of XML document having default namespace, the element where default namespace declared and all of it's descendant without prefix and without different default namespace declaration are considered in that aforementioned default namespace.

因此,在您的情况下,您需要在XPath中所有元素的开头使用为默认名称空间注册的前缀,例如:

Therefore, in your case you need to use prefix registered for default namespace at the beginning of all elements in the XPath, for example :

/xmlns:doc//xmlns:b[@omegahat:status='foo']

更新:

实际上我不是r的用户,但是在网上查看一些引用可能是可行的:

Actually I'm not a user of r, but looking at some references on net something like this may work :

getNodeSet(doc, "/ns:doc//ns:b[@omegahat:status='foo']", c(ns="http://something.org"))

这篇关于具有显式默认名称空间的XML文档的XPath和名称空间规范的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆