scrapy HtmlXPathSelector通过搜索关键字确定xpath [英] scrapy HtmlXPathSelector determine xpath by searching for keyword

查看：75 发布时间：2020/5/4 8:39:01 xpath lxml scrapy

本文介绍了scrapy HtmlXPathSelector通过搜索关键字确定xpath的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一部分html，如下所示

I have a portion of html like below

<li><label>The Keyword:</label><span><a href="../../..">The text</a></span></li>

我想获取字符串关键字:文本".

I want to get the string "The keyword: The text".

我知道我可以使用Chrome inspect或FF firebug获取上述html的xpath，然后获取hxs.select(xpath).extract()，然后剥离html标签以获取字符串.但是，由于xpath在不同页面之间不一致，因此该方法不够通用.

I know that I can get xpath of above html using Chrome inspect or FF firebug, then hxs.select(xpath).extract(), then strip html tags to get the string. However, the approach is not generic enough since the xpath is not consistent across different pages.

因此，我正在考虑以下方法: 首先，使用

Hence, I'm thinking of below approach: Firstly, search for "The Keyword:" using

hxs = HtmlXPathSelector(response)
hxs.select('//*[contains(text(), "The Keyword:")]')

何时进行pprint我会得到一些回报:

When do pprint I get some return:

>>> pprint( hxs.select('//*[contains(text(), "The Keyword:")]') )
<HtmlXPathSelector xpath='//*[contains(text(), "The Keyword:")]' data=u'<label>The Keyword:</label>'>

我的问题是如何获取所需的字符串:关键字:文本".我正在考虑如何确定xpath，如果知道xpath，那么我当然可以获取所需的字符串.

My question is how to get the wanted string: "The keyword: The text". I am thinking of how to determine xpath, if xpath is known, then of course I can get the wanted string.

除了易碎的HtmlXPathSelector，我还接受其他任何解决方案. (例如lxml.html可能具有更多功能，但我对此很陌生).

I am open to any solution other than scrapy HtmlXPathSelector. ( e.g lxml.html might have more features but I am very new to it).

谢谢.

scrapy HtmlXPathSelector通过搜索关键字确定xpath [英] scrapy HtmlXPathSelector determine xpath by searching for keyword

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

scrapy HtmlXPathSelector通过搜索关键字确定xpath [英] scrapy HtmlXPathSelector determine xpath by searching for keyword

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭