在 shell 下找不到正确的 xpath [英] cannot find correct xpath under shell

查看:69
本文介绍了在 shell 下找不到正确的 xpath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Scrapy 还很陌生,所以请耐心等待.

I am fairly new to Scrapy, so please bear with me for a moment.

我想抓取此页面以获取以下信息

I want to scrape this page for the following information

  • 项目首字母缩略词(PROTECTRAIL)
  • 项目简短说明(铁路行业铁路运输综合安全合作伙伴关系)
  • 项目详细描述(面临加强[...]建筑物和基础设施保护的问题)

使用 Google Scraper 我已经检查了这些元素并在 html 页面中确定了它们的 Xpath

Using Google Scraper I have inpected these elements and determined their Xpath in the html page

  • 缩写://*[@id='recorddetails']/div/div[1]/h1
  • 简短说明://*[@id='recorddetails']/div/div[1]/h2
  • 详细说明://*[@id='recorddetails']/div/div[4]/div[2]/div[1]/p/text()

然后我在 SHELL 下测试了以下 Xpath 查询

I have then tested the following Xpath queries under SHELL

  • 缩写:sel.xpath("//*[@id='recorddetails']/div/div[1]/h1").extract()
  • 简短说明:sel.xpath("///*[@id='recorddetails']/div/div[1]/h2")
  • 长描述:sel.xpath("///*[@id='recorddetails']/div/div[4]/div[2]/div[1]/p/text()").extract()

但是对于这些不同的 Xpath 查询,shell 没有产生任何结果 [],而它们似乎编写正确(没有语法错误)并且足够准确.

But shell yields no result [] for these different Xpath queries, while they seem to be properly written (no syntax error) and accurate enough.

如何使用正确的 Xpath 找到正确的选择器以获取这些信息?

How can I find out the proper selector with the proper Xpath so as to fetch those information?

推荐答案

查看firebug net选项卡,按XHR请求过滤,看来你要的数据在后面的AJAX调用中:

looking at firebug net tab, filter by XHR request, it seem that the data you are after is in a latter AJAX call to:

$ scrapy shell "http://cordis.europa.eu/projects/index.cfm?fuseaction=app.csa&action=read&xslt-template=projects/xsl/projectdet_en.xslt&rcn=95607"
....
>>> sel.xpath("//div[@class='projttl']/h1/text()").extract()
[u'PROTECTRAIL']

另外,最好让自己熟悉 xpath 语法,而不是使用那些自动 xpath 工具

also, better make yourself familiar with xpath syntax rather than using those auto xpath tools

这篇关于在 shell 下找不到正确的 xpath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆