xpath:字符串操作 [英] xpath: string manipulation

查看：57 发布时间：2021/7/16 21:53:05 python xpath scrapy

本文介绍了xpath:字符串操作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以在我的scrapy项目中，我能够隔离一些特定的字段，其中一个字段返回如下内容:

So in my scrapy project I was able to isolate some particular fields, one of the field return something like:

[Rank Info] on 2013-06-27 14:26 Read 174 Times

由表达式选择:

(//td[@class="show_content"]/text())[4]

我通常会做后处理来提取日期时间信息，即 2013-06-27 14:26 现在自从我对 xpath 子字符串操作有了更多了解后，我想知道是否有可能首先提取那条信息，即在 xpath 表达式本身中?

I usually do post-processing to extract the datetime information, i.e., 2013-06-27 14:26 Now since I've learned a little more on the xpath substring manipulation, I am wondering if it is even possible to extract that piece of information in the first place, i.e., in the xpath expression itself?

谢谢，

推荐答案

Scrapy 使用 XPath 1.0，它的字符串操作能力非常有限，特别是不支持正则表达式.有两种方法可以减少字符串，我用一个例子来演示这两种方法，以减少到您要查找的子字符串.

Scrapy uses XPath 1.0 which has very limited string manipulation capabilities, especially does not support regular expressions. There are two ways to cut down a string, I demonstrate both with an example to strip down to the substring you're looking for.

如果字符索引不改变(但内容可以)，这很好.

This is fine if the character indices do not change (but the contents could).

substring($string, $start, $len)
substring(//td[@class="show_content"]/text(), 16, 16)

通过前/后缀搜索

如果索引可以改变，这很好，但字符串前后的内容保持不变:

By pre-/suffix Search

This is fine if the index can change, but the contents immediatly before and after the string stay the same:

substring-before($string, $needle)
substring-after($string, $needle)
substring-before(
  substring-after(//td[@class="show_content"]/text(), 'on '), ' Read')

这篇关于xpath:字符串操作的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

xpath:字符串操作 [英] xpath: string manipulation

问题描述

推荐答案

通过前/后缀搜索

By pre-/suffix Search

相关文章

Python最新文章

热门教程

热门工具

登录关闭

xpath:字符串操作 [英] xpath: string manipulation

问题描述

推荐答案

通过前/后缀搜索

By pre-/suffix Search

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭