XPath来处理文本长度GT元素第一次出现; = 200个字符 [英] XPath to first occurrence of element with text length >= 200 characters

查看:311
本文介绍了XPath来处理文本长度GT元素第一次出现; = 200个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何得到一个具有内部文本(纯文本,放弃其他的孩子)的200个或更多字符的长度?

How do I get the first element that has an inner text (plain text, discarding other children) of 200 or more characters in length?

我想的第一要素创建一个HTML解析器像 Embed.ly ,我已经设置了回退的系统,在这里我先检查 OG :描述,那么我会寻找这种情况的发生,然后才为说明 meta标签

I'm trying to create an HTML parser like Embed.ly and I've set up a system of fallbacks where I first check for og:description, then I would search for this occurrence and only then for the description meta tag.

这是因为大多数的网站,甚至还包括 meta描述描述自己的网站在该标签,而不是当前网页的内容。

This is because most sites that even include meta description describe their site in that tag, instead of the contents of the current page.

例如:

<html>
    <body>
        <div>some characters
            <p>200 characters <span>some more stuff</span></p>
        </div>
    </body>
</html>



我可以使用什么样的选择,以获得 200字符的HTML 部分分段?我不希望的一些更多的东西的要么,我不关心它是什么元素(除了<脚本> <风格> ),只要它至少包含200个字符的第一个纯文本

What selector could I use to get the 200 characters portion of that HTML fragment? I don't want the some more stuff either, I don't care what element it is (except for <script> or <style>), as long as it's the first plain text to contain at least 200 characters.

什么都要的。 XPath查询的样子

What should the XPath query look like?

推荐答案

使用

(//*[not(self::script or self::style)]/text()[string-length() > 200])[1]

注意:如果该文件是一个XHTML文档(这意味着所有元素都在xhrml命名空间),上面的表达式应该被指定为:

Note: In case the document is an XHTML document (and that means all elements are in the xhrml namespace), the above expression should be specified as:

(//*[not(self::x:script or self::x:style)]/text()[string-length() > 200])[1]

其中前缀X:必须绑定到XHTML命名空间 - HTTP:// WWW .w3.org / 1999 / XHTML(或尽可能多的XPath的API调用此 - 命名空间必须是注册的以这个前缀)

where the prefix "x:" must be bound to the XHTML namespace -- "http://www.w3.org/1999/xhtml" (or as many XPath APIs call this -- the namespace must be "Registered" with this prefix)

这篇关于XPath来处理文本长度GT元素第一次出现; = 200个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆