xpath:字符串操作 [英] xpath: string manipulation

查看:57
本文介绍了xpath:字符串操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以在我的scrapy项目中,我能够隔离一些特定的字段,其中一个字段返回如下内容:

So in my scrapy project I was able to isolate some particular fields, one of the field return something like:

[Rank Info] on 2013-06-27 14:26 Read 174 Times

由表达式选择:

(//td[@class="show_content"]/text())[4]

我通常会做后处理来提取日期时间信息,即 2013-06-27 14:26 现在自从我对 xpath 子字符串操作有了更多了解后,我想知道是否有可能首先提取那条信息,即在 xpath 表达式本身中?

I usually do post-processing to extract the datetime information, i.e., 2013-06-27 14:26 Now since I've learned a little more on the xpath substring manipulation, I am wondering if it is even possible to extract that piece of information in the first place, i.e., in the xpath expression itself?

谢谢,

推荐答案

Scrapy 使用 XPath 1.0,它的字符串操作能力非常有限,特别是不支持正则表达式.有两种方法可以减少字符串,我用一个例子来演示这两种方法,以减少到您要查找的子字符串.

Scrapy uses XPath 1.0 which has very limited string manipulation capabilities, especially does not support regular expressions. There are two ways to cut down a string, I demonstrate both with an example to strip down to the substring you're looking for.

如果字符索引不改变(但内容可以),这很好.

This is fine if the character indices do not change (but the contents could).

substring($string, $start, $len)
substring(//td[@class="show_content"]/text(), 16, 16)

通过前/后缀搜索

如果索引可以改变,这很好,但字符串前后的内容保持不变:

By pre-/suffix Search

This is fine if the index can change, but the contents immediatly before and after the string stay the same:

substring-before($string, $needle)
substring-after($string, $needle)
substring-before(
  substring-after(//td[@class="show_content"]/text(), 'on '), ' Read')

这篇关于xpath:字符串操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆