xpath无法识别标签 [英] xpath could not recognize a tag

查看:326
本文介绍了xpath无法识别标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用xpath从论坛中删除reddit帖子。
我希望蜘蛛实现的功能之一就是当它从当前页面结束时自动进入下一页。
页面html代码如下所示:



< span>

我使用了xpath选择器:
response.xpath(// a [@class ='next-button'])
但它没有给我一切都回来了有人可以帮我弄清楚为什么吗?



谢谢!
Hao

解决方案

@class span 元素,而不是 a 链接元素。因此,将您的XPath更改为

  response.xpath(// span [@class ='next-button'] / a )

选择 a

  response.xpath(// span [@class ='next-button'] / a / @ href)

可以获取链接地址。


I am trying to use xpath to scrape reddit posts from a forum. One of the functions I want the spider to achieve is to automatically go to the next page as soon as it finishes scrapping from the current page. The page html code looks like this:

<span class="next-button"><a href="https://www.reddit.com/r/InteriorDesign/?count=975&amp;after=t3_8ol7yp" rel="nofollow next" >next &rsaquo;</a></span>

and I used the xpath selector as: response.xpath("//a[@class = 'next-button']") but it didn't give me anything back. Can someone help me figure out why?

thanks! Hao

解决方案

The @class attribute is on the span element and not the a link element. So change your XPath to

response.xpath("//span[@class = 'next-button']/a")

to select a or

response.xpath("//span[@class = 'next-button']/a/@href")

to get the link address.

这篇关于xpath无法识别标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆