Scrapy:点击链接获取额外的物品数据? [英] Scrapy: Follow link to get additional Item data?

查看:26
本文介绍了Scrapy:点击链接获取额外的物品数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我没有特定的代码问题,只是不确定如何使用 Scrapy 框架从逻辑上解决以下问题:

I don't have a specific code issue I'm just not sure how to approach the following problem logistically with the Scrapy framework:

我想要抓取的数据结构通常是每个项目的表格行.够直接了吧?

The structure of the data I want to scrape is typically a table row for each item. Straightforward enough, right?

最终我想抓取每一行的标题截止日期详细信息.标题截止日期立即显示在页面上...

Ultimately I want to scrape the Title, Due Date, and Details for each row. Title and Due Date are immediately available on the page...

但是详细信息本身不在表格中——而是指向包含详细信息的页面的链接(如果这没有意义,这里有一个表格):

BUT the Details themselves aren't in the table -- but rather, a link to the page containing the details (if that doesn't make sense here's a table):

|-------------------------------------------------|
|             Title              |    Due Date    |
|-------------------------------------------------|
| Job Title (Clickable Link)     |    1/1/2012    |
| Other Job (Link)               |    3/2/2012    |
|--------------------------------|----------------|

恐怕我仍然不知道如何通过回调和请求逻辑地传递项目,即使在阅读了 Scrapy 文档的 CrawlSpider 部分之后也是如此.

I'm afraid I still don't know how to logistically pass the item around with callbacks and requests, even after reading through the CrawlSpider section of the Scrapy documentation.

推荐答案

请先阅读 docs 来理解我的意思.

Please, first read the docs to understand what i say.

答案:

要抓取其他页面上的其他字段,在解析方法中提取具有附加信息的页面的 URL,创建并从该解析方法返回具有该 URL 的 Request 对象,并通过其 meta 传递已经提取的数据 参数.

To scrape additional fields which are on other pages, in a parse method extract URL of the page with additional info, create and return from that parse method a Request object with that URL and pass already extracted data via its meta parameter.

我该怎么做在scrapy中将目标页面的结果合并到当前页面?

这篇关于Scrapy:点击链接获取额外的物品数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆