在 XPath 中使用 OR 运算符 [英] Using OR operator in XPath

查看:46
本文介绍了在 XPath 中使用 OR 运算符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 XPath 表达式中使用 OR 运算符(不止一次)以在遇到特定字符串之前提取我需要的内容,例如参考"、更多信息"等.这些术语应该返回相同的结果,但它们的顺序可能不同.例如,参考"可能不是第一个,也可能根本不在内容中,其中一个匹配项使用了关于数据"表.在这些字符串中的任何一个出现之前,我想要所有内容.

I'm using the OR operator (more than once) in my XPath expression to extract what I need in the content before a specific string is encountered, such as 'Reference,' 'For more information,' etc. Any of these terms should return the same result, yet they may not be in that order. For example, 'Reference' might not be first and may not be in the content at all, and one of the matches uses a table, 'About the data.' I want all content before any one of these strings appears.

任何帮助将不胜感激.

$expression =
    "//p[
        starts-with(normalize-space(), 'Reference') or 
        starts-with(normalize-space(), 'For more')
    ]/preceding-sibling::p";

那还需要考虑表格:

$expression =
    "//article/table/tbody/tr/td[
        starts-with(normalize-space(), 'About the data used')
]/preceding-sibling::p";

这是一个例子:

<root>
    <main>
        <article>
            <p>
                The stunning increase in homelessness announced in Los Angeles
                this week — up 16% over last year citywide — was an almost an
                incomprehensible conundrum.
            </p>
            <p>
                "We cannot let a set of difficult numbers discourage us
                or weaken our resolve" Garcetti said.
            </p>
            <p>
                References
                By Jeremy Herb, Caroline Kelly and Manu Raju, CNN
            </p>
            <p>
                For more information: Maeve Reston, CNN
            </p>
            <p>Maeve Reston, CNN</p>
            <table>
                <tbody>
                    <tr>
                        <td>
                            <strong>About the data used</strong>
                        </td>
                    </tr>
                    <tr>
                        <td>From
                        </td>
                        <td>Washington, CNN</td>
                    </tr>
                </tbody>
            </table>
        </article>
    </main>
</root>

我正在寻找的结果如下.

The result I'm looking for would be the following.

<p>
    The stunning increase in homelessness announced in Los Angeles
    this week — up 16% over last year citywide — was an almost  an
    incomprehensible conundrum.
</p>
<p>
    "We cannot let a set of difficult numbers discourage us
    or weaken our resolve" Garcetti said.
</p>

推荐答案

我想要在这些字符串中的任何一个出现之前的所有内容.

I want all content before any one of these strings appears.

也就是说,您希望第一段之前的内容包含这些字符串之一.

That is, you want the content before the first paragraph to contain one of these strings.

包含这些字符串之一的段落是:

The paragraphs that contain one of these strings are:

p[starts-with(normalize-space(), 'References') or starts-with(....)]

第一个这样的段落是

p[starts-with(normalize-space(), 'References') or starts-with(....)][1]

之前的段落是:

p[starts-with(normalize-space(), 'References') or starts-with(....)][1]
/preceding-sibling::p

在 2.0 中,我可能会使用正则表达式:

In 2.0 I would probably use a regular expression:

p[matches(., '^\s*(References|For more information)')]

避免对 normalize-space() 的重复调用.

to avoid the repeated calls on normalize-space().

这篇关于在 XPath 中使用 OR 运算符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆