xpath:字符串操作 [英] xpath: string manipulation
问题描述
所以在我的scrapy项目中,我能够隔离一些特定的字段,其中一个字段返回如下内容:
So in my scrapy project I was able to isolate some particular fields, one of the field return something like:
[Rank Info] on 2013-06-27 14:26 Read 174 Times
由表达式选择:
(//td[@class="show_content"]/text())[4]
我通常会做后处理来提取日期时间信息,即 2013-06-27 14:26
现在自从我对 xpath 子字符串操作有了更多了解后,我想知道是否有可能首先提取那条信息,即在 xpath 表达式本身中?
I usually do post-processing to extract the datetime information, i.e., 2013-06-27 14:26
Now since I've learned a little more on the xpath substring manipulation, I am wondering if it is even possible to extract that piece of information in the first place, i.e., in the xpath expression itself?
谢谢,
推荐答案
Scrapy 使用 XPath 1.0,它的字符串操作能力非常有限,特别是不支持正则表达式.有两种方法可以减少字符串,我用一个例子来演示这两种方法,以减少到您要查找的子字符串.
Scrapy uses XPath 1.0 which has very limited string manipulation capabilities, especially does not support regular expressions. There are two ways to cut down a string, I demonstrate both with an example to strip down to the substring you're looking for.
如果字符索引不改变(但内容可以),这很好.
This is fine if the character indices do not change (but the contents could).
substring($string, $start, $len)
substring(//td[@class="show_content"]/text(), 16, 16)
通过前/后缀搜索
如果索引可以改变,这很好,但字符串前后的内容保持不变:
By pre-/suffix Search
This is fine if the index can change, but the contents immediatly before and after the string stay the same:
substring-before($string, $needle)
substring-after($string, $needle)
substring-before(
substring-after(//td[@class="show_content"]/text(), 'on '), ' Read')
这篇关于xpath:字符串操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!