从 Flask 路线开始scrapy [英] Start scrapy from Flask route

查看:37
本文介绍了从 Flask 路线开始scrapy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想构建一个爬虫,它获取要抓取的网页的 URL 并将结果返回给网页.现在我从终端开始scrapy并将响应存储在一个文件中.当某些输入发布到 Flask 时,如何启动爬虫、处理并返回响应?

I want to build a crawler which takes the URL of a webpage to be scraped and returns the result back to a webpage. Right now I start scrapy from the terminal and store the response in a file. How can I start the crawler when some input is posted on to Flask, process, and return a response back?

推荐答案

您需要在 Flask 应用程序中创建一个 CrawlerProcess 并以编程方式运行爬网.请参阅文档.

You need to create a CrawlerProcess inside your Flask application and run the crawl programmatically. See the docs.

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # The script will block here until the crawl is finished

在继续您的项目之前,我建议您查看 Python 任务队列(例如 rq).这将允许您在后台运行 Scrapy 爬网,并且您的 Flask 应用程序不会在抓取运行时冻结.

Before moving on with your project I advise you to look into a Python task queue (like rq). This will allow you to run Scrapy crawls in the background and your Flask application will not freeze while the scrapes are running.

这篇关于从 Flask 路线开始scrapy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆