当XML格式各不相同的XPath查询 [英] xpath querying when xml format varies

查看:100
本文介绍了当XML格式各不相同的XPath查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样一系列的变量类型:

I have a series of variable types like:

abc1A, abc1B, abc3B, ...
xyz1A, xyz2A, xyz3C, ...
data1C, data2A, ...

储存于各种XML格式:

Stored in a variety of xml formats:

<area name="DataMap">
    <int name="number" nullable="true">
        <case var="abc2,abc3,abc5">11</case>
        <case var="abc4,abc6*">8</case>
        <case var="data1,xyz7,xyz8">22</case>
        <case var="data3A,xyz{9},xyz{5A,5B,5C}">24</case>
        <case var="xyz{6,4A,4B,4C}">20</case>
        <case var="other01">15</case>
    </int>
</area>

我希望要查询的实例一样xyz5A,例如,映射到。查询应返回24,但我不知道提前的时间,如果它在XML节点引用是明确的,如xyz4A,或通过通配符,如* xyz4,或在大括号像上面。

I'm hoping to query what an instance like xyz5A, for example, maps to. The query should return 24, but I don't know ahead of time if its reference in the xml node is explicit as in "xyz4A", or via a wildcard like "xyz4*", or in curly braces like above.

这可查询该行串并成功返回一击:

This queries for strings on that line and will return a hit successfully:

xpath '/area[@name="DataMap"]/int[@name="number"]/case[contains(@var,"xyz")][contains(@var,"5A")]'

但它也返回data5A一击这是不是不正确的:

But it also returns a hit for data5A which is not incorrect:

xpath '/area[@name="DataMap"]/int[@name="number"]/case[contains(@var,"data")][contains(@var,"5A")]'

有没有解析不一致的(但我认为有效的)上面的XML的XPath /其他查询结构?我只似乎能够对查询明确的字符串匹配与通配符和卷曲的支撑格式。

Are there xpath/other query constructs that parse the inconsistent (but I assume valid) xml above? I only seem to be able to query against explicit string matches vs. the wildcard and curly braced formats.

推荐答案

庆典/ perl的你很可能势必的libxml 。 libxml的不支持的XPath 2.0。有对SO有关使用的libxml / libxslt上和Perl的XPath / XSLT 2.0的许多问题。

Being in bash/perl you are likely bound to libxml. libxml doesn't support XPath 2.0. There are many questions on SO about XPath/XSLT 2.0 with libxml/libxslt and Perl.

的XPath 1.0有多种(一个小的我不得不承认)字符串函数,然后你可以尝试将它们堆叠在一起。我尝试了一下,却没有我喜欢的结果不是我做成功以涵盖所有可能的情况。你将不得不丑陋的结构,如:

XPath 1.0 has a variety (a small one I have to admit) of string functions and you could try to stack them up together. I experimented for a bit and neither did I like the result not did I succeed to cover all possible cases. You would have "ugly" constructs like:

...
or
(contains(@var, ',xyz{') and 
 contains(substring-before(substring-after(@var, ',xyz{'), '}'), '5A') and
     (contains(substring-before(substring-after(@var, ',xyz{'), '}'), ',5A,') or
      starts-with(substring-after(@var, ',xyz{'), '5A,') or
      starts-with(substring-after(@var, ',xyz{'), '5A}') or
      substring-after(substring-before(substring-after(@var, ',xyz{'), '}'), ',5A') = ''))

or
...

然后,你会意识到子 - * 函数工作过匹配的字符串的第一次出现,你需要甚至更多层和 s到处理像你这样的情况:

And then you would realize that substring-* functions work off of the first occurrence of the matching string and you need even more layers of ands and ors to handle cases like yours:

<case var="data3A,xyz{9},xyz{5A,5B,5C}">24</case>

在有多个 XYZ {键,你需要的是不知道是第一个的人。

where there are multiple xyz{ and the one you need is not known to be the first one.

我觉得这就是你忘了你有一个XML,只是做的Perl是良好的,并把它当作文本的情况。就像我喜欢的XML处理和数据提取XML的工具,你将有可能在被设计为它的语言正则表达式和字符串处理更好。

I think this is the case where you forget you have an XML and just do what Perl is good for and treat it as text. As much as I like XML-aware tools for XML processing and data extraction you will likely be better off with regexp and string manipulations in the language that was designed for it.

这篇关于当XML格式各不相同的XPath查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆