如何从 XPath(Python/scrapy) 中的类属性获取标题 [英] How to get title from class attribute in XPath(Python/scrapy)

查看:67
本文介绍了如何从 XPath(Python/scrapy) 中的类属性获取标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从 tripadvisor 获取数据,但大多数第一个是相对日期,其余是正常的 MM/DD/YYYY,但仔细检查我发现相对日期有这个

Im working on getting the data from tripadvisor but most of the first ones are relative date and the rest are normal MM/DD/YYYY, but with closer inspection I see that relative date has this

<span class="ratingDate relativeDate" title="20 June 2015">Reviewed 4 weeks ago
</span>

我正在使用这个 Xpath 来获取数据

I am using this Xpath to get the data

response.xpath('//div[@class="col2of2"]//span[@class="ratingDate relativeDat
e" or @class="ratingDate"]/text()').extract()

我的问题是如何添加@title 以便获得具有正常日期格式的标题.

My question is How do I add the @title so that I can get the title which has the normal date format.

我试过了

response.xpath('//div[@class="col2of2"]//span[@class="ratingDate relativeDat
e"/@title or @class="ratingDate"]/text()').extract()

response.xpath('//div[@class="col2of2"]//span[@class="ratingDate relativeDat
e" or @class="ratingDate"]/@title/text()').extract()

推荐答案

在蜘蛛中弄清楚了,您必须执行一个条件语句,该语句将动态检查该 xpath 是否包含值.

Figured it out in the spider you have to do a conditional statement that will dynamically check whether that xpath contains values or not.

这是我的演绎.

item['date'] = sel.xpath('//*[@class="ratingDate relativeDate"]/@title').extract()
item['date'] += sel.xpath('//div[@class="col2of2"]//span[@class="ratingDate"]/text()').extract()

这篇关于如何从 XPath(Python/scrapy) 中的类属性获取标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆