Scrapy:按照链接获取附加项目数据? [英] Scrapy: Follow link to get additional Item data?

查看:175
本文介绍了Scrapy:按照链接获取附加项目数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我没有一个具体的代码问题我只是不知道如何处理以下问题与Scrapy框架后勤:

I don't have a specific code issue I'm just not sure how to approach the following problem logistically with the Scrapy framework:

数据的结构我想擦除通常是每个项目的表行。

The structure of the data I want to scrape is typically a table row for each item. Straightforward enough, right?

最后,我要将标题到期日期详细信息标题到期日期可立即在页面上显示...

Ultimately I want to scrape the Title, Due Date, and Details for each row. Title and Due Date are immediately available on the page...

strong>自己不在表中 - 而是指向包含详细信息的页面的链接(如果这在表格中没有意义):

BUT the Details themselves aren't in the table -- but rather, a link to the page containing the details (if that doesn't make sense here's a table):

|-------------------------------------------------|
|             Title              |    Due Date    |
|-------------------------------------------------|
| Job Title (Clickable Link)     |    1/1/2012    |
| Other Job (Link)               |    3/2/2012    |
|--------------------------------|----------------|

恐怕我还是不知道如何通过回调和请求

I'm afraid I still don't know how to logistically pass the item around with callbacks and requests, even after reading through the CrawlSpider section of the Scrapy documentation.

推荐答案

请先阅读Scrapy文档的 CrawlSpider 文档了解我的说法。

Please, first read the docs to understand what i say.

答案:

要在其他页面上抓取其他字段,在具有附加信息的页面的解析方法提取URL中,创建并从该解析方法返回请求对象,并通过 meta 参数传递已提取的数据。

To scrape additional fields which are on other pages, in a parse method extract URL of the page with additional info, create and return from that parse method a Request object with that URL and pass already extracted data via its meta parameter.

如何将结果从目标页面合并到scrapy中的当前页面?

这篇关于Scrapy:按照链接获取附加项目数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆