尽管标签和语法是正确的,但 Scrapy xpath 返回一个空列表 [英] Scrapy xpath returns an empty list although tag and syntax are correct

查看:39
本文介绍了尽管标签和语法是正确的,但 Scrapy xpath 返回一个空列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的解析函数中,这是我编写的代码:

In my parse function, here is the code I have written:

hs = Selector(response)
links = hs.xpath(".//*[@id='requisitionListInterface.listRequisition']")
items = []
for x in links:
        item =  CrawlsiteItem()
        item["title"] = x.xpath('.//*[contains(@title, "View this job           description")]/text()').extract()
        items.append(item)
return items    

并且标题返回一个空列表.

and title returns an empty list.

我在链接中捕获一个带有 id 标签的 xpath,然后在链接标签中,我想获取所有带有查看此职位描述的标题的值的列表.

I am capturing an xpath with an id tag in the links and then with in the links tag, I want to get list of all the values withthe title that has view this job description.

请帮我修复代码中的错误.

Please help me fix the error in the code.

推荐答案

如果您 cURL 您提供的 URL 的请求 curl "https://cognizant.taleo.net/careersection/indapac_itbpo_ext_career/moresearch.ftl?lang=en" 您返回的网站方式与您在浏览器中看到的网站方式不同.您的搜索结果在以下 元素中没有任何 text() 属性可供选择:

If you cURL the request of the URL you provided with curl "https://cognizant.taleo.net/careersection/indapac_itbpo_ext_career/moresearch.ftl?lang=en" you get back a site way different from the one you see in your browser. Your search results in the following <a> element which does not have any text() attribute to select:

<a id="requisitionListInterface.reqTitleLinkAction" 
    title="View this job description"
    href="#"
    onclick="javascript:setEvent(event);requisition_openRequisitionDescription('requisitionListInterface','actOpenRequisitionDescription',_ftl_api.lstVal('requisitionListInterface', 'requisitionListInterface.listRequisition', 'requisitionListInterface.ID5645', this),_ftl_api.intVal('requisitionListInterface', 'requisitionListInterface.ID5649', this));return ftlUtil_followLink(this);">
</a>

这是因为站点加载站点加载显示的信息与 XHR 请求(例如,您可以在 Chrome 中查找),然后站点使用返回的信息动态更新.

This is because the site loads the site loads the information displayed with an XHR request (you can look up this in Chrome for example) and then the site is updated dynamically with the returned information.

对于您想要提取的信息,您应该找到这个 XHR 请求(这并不难,因为这是唯一的一个)并从您的抓取工具中调用它.然后从结果数据集中,您可以提取所需的数据——您只需要创建一个解析算法,该算法通过这种管道分隔格式并将其拆分为职位发布,然后提取您需要的信息,如位置、ID、日期和位置.

For the information you want to extract you should find this XHR request (it is not hard because this is the only one) and call it from your scraper. Then from the resulting dataset you can extract the required data -- you just have to create a parsing algorithm which goes through this pipe separated format and splits it up into job postings and then extracts the information you need like position, id, date and location.

这篇关于尽管标签和语法是正确的,但 Scrapy xpath 返回一个空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆