仅选择所有节点的子字符串的 html 的 XPath 函数“substring-after"的正确语法? [英] Correct syntax of XPath function 'substring-after' for html that selects only substring of all nodes?

查看:23
本文介绍了仅选择所有节点的子字符串的 html 的 XPath 函数“substring-after"的正确语法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个只选择所有节点的子串的 x 路径.我一直在使用这个 x-path 但选择所有文本而不是子字符串.

I need a x path that selects only sub-string of all nodes. I have be using this x-path but selects all text instead of sub string.

//span[@class="feed-date"]/text()[substring-after(., "on ")]

Html 我有:我愿意只提取发布于"之后的日期

Html I have: I am willing to extract only date after 'Published on'

<span class="feed-date">Published on 2016-07-07</span>
<span class="feed-date">Published on 2015-02-23</span>
<span class="feed-date">Published on 2014-11-13</span>
<span class="feed-date">Published on 2014-04-28</span>

我发现这个链接说你可以在 xml 中做到这一点

但是我不能用 html 来做.有什么办法可以做到这一点吗?

But I can't do it with html. Is there any way to achieve this?

推荐答案

分别在 XPath 2.0 及更高版本 XQuery 1.0 及更高版本或 XSLT 2.0 及更高版本中,您可以使用 //span[@class = 'feed-date']/substring-after(., 'on ') 获取字符串值序列.在 XPath 1.0 中,该功能不存在,您需要在宿主语言中迭代所有 span 元素并为每个 span 提取字符串.

In XPath 2.0 and later respectively XQuery 1.0 and later or XSLT 2.0 and later you can use //span[@class = 'feed-date']/substring-after(., 'on ') to get a sequence of string values. With XPath 1.0 that functionality does not exist, you would need to iterate all your span elements in a host language and extract the string for each span.

至于将 XPath 2.0 与 HTMLAgilityPack 一起使用,看起来似乎可以使用 https://github.com/StefH/XPath2.Net 也可在 NuGet 上使用,这样 Microsoft XPathNavigator 获得各种扩展方法,例如 XPath2Evaluate 然后允许您在从 Microsoft 的 XPathDocument 创建的 XPathNavigator 以及 HTMLAgilityPack 的 HtmlDocument 上使用 XPath 2.0 函数.

As for using XPath 2.0 with HTMLAgilityPack, it looks as if that is possible making use of https://github.com/StefH/XPath2.Net which is also available on NuGet, that way the Microsoft XPathNavigator gets various extension methods like XPath2Evaluate which then allow you to use XPath 2.0 functions both on an XPathNavigator created from Microsoft's XPathDocument as well as the HTMLAgilityPack's HtmlDocument.

这是一个例子:

using System;
using System.Xml.XPath;
using Wmhelp.XPath2;
using HtmlAgilityPack;

namespace XPath20Net1
{
    class Program
    {
        static void Main(string[] args)
        {
            XPathNavigator nav = new XPathDocument("XMLFile1.xml").CreateNavigator();
            Console.WriteLine(nav.XPath2Evaluate("string-join(//span[@class = 'feed-date']/substring-after(., 'on '), ' ')"));

            HtmlDocument doc = new HtmlDocument();
            doc.Load("HTMLPage1.html");

            Console.WriteLine(doc.CreateNavigator().XPath2Evaluate("string-join(//span[@class = 'feed-date']/substring-after(., 'on '), ' ')"));
        }
    }
}

用XML文件

<?xml version="1.0" encoding="utf-8" ?>
<html>
  <body>
    <span class="feed-date">Published on 2016-07-07</span>
    <span class="feed-date">Published on 2015-02-23</span>
    <span class="feed-date">Published on 2014-11-13</span>
    <span class="feed-date">Published on 2014-04-28</span>
  </body>
</html>

和 HTML 文档是

and the HTML document being

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>Test</title>
</head>
<body>
 <p id=test>

         <span class="feed-date">Published on 2016-07-07</span>
         <span class="feed-date">Published on 2015-02-23</span>
         <span class="feed-date">Published on 2014-11-13</span>
         <span class="feed-date">Published on 2014-04-28</span>

</body>
</html>

然后输出是

2016-07-07 2015-02-23 2014-11-13 2014-04-28
2016-07-07 2015-02-23 2014-11-13 2014-04-28

这篇关于仅选择所有节点的子字符串的 html 的 XPath 函数“substring-after"的正确语法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆