Scrapy - 根据文本选择特定链接 [英] Scrapy - Select specific link based on text
问题描述
这应该很容易,但我卡住了.
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=2&powerunit=2">链接文本 2</a>|<a href="/en/overview/0-All_manufactures/0-All_models.html?page=3&powerunit=2">链接文本 3</a>|<a href="/en/overview/0-All_manufactures/0-All_models.html?page=4&powerunit=2">链接文本4</a>|<a href="/en/overview/0-All_manufactures/0-All_models.html?page=5&powerunit=2">链接文本 5</a>|<!-- 下一页链接--><a href="/en/overview/0-All_manufactures/0-All_models.html?page=2&powerunit=2">链接文本下一个></a>
我正在尝试使用 Scrapy (Basespider) 根据链接文本选择链接:
nextPage = HtmlXPathSelector(response).select("//div[@class='paginationControl']/a/@href").re("(.+)*?Next")
例如,我想根据它的文本是链接文本下一个"这一事实来选择下一页链接.有什么想法吗?
使用 a[contains(text(),'Link Text Next')]
:
nextPage = HtmlXPathSelector(response).select("//div[@class='paginationControl']/a[contains(text(),'Link Text Next')]/@href")
参考:XPath 文档包含函数
<小时>附注.您的文本 Link Text Next
末尾有一个空格.为了避免在代码中包含该空格:
text()="下一个链接文本"
我认为使用 contains
更通用,但仍然足够具体.
This should be easy but I'm stuck.
<div class="paginationControl">
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=2&powerunit=2">Link Text 2</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=3&powerunit=2">Link Text 3</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=4&powerunit=2">Link Text 4</a> |
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=5&powerunit=2">Link Text 5</a> |
<!-- Next page link -->
<a href="/en/overview/0-All_manufactures/0-All_models.html?page=2&powerunit=2">Link Text Next ></a>
</div>
I'm trying to use Scrapy (Basespider) to select a link based on it's Link text using:
nextPage = HtmlXPathSelector(response).select("//div[@class='paginationControl']/a/@href").re("(.+)*?Next")
For example, I want to select the next page link based on the fact that it's text is "Link Text Next". Any ideas?
Use a[contains(text(),'Link Text Next')]
:
nextPage = HtmlXPathSelector(response).select(
"//div[@class='paginationControl']/a[contains(text(),'Link Text Next')]/@href")
Reference: Documentation on the XPath contains function
PS. Your text Link Text Next
has a space at the end. To avoid having to include that space in the code:
text()="Link Text Next "
I think using contains
is a bit more general while still being specific enough.
这篇关于Scrapy - 根据文本选择特定链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!