Scrapy 非常基本的例子 [英] Scrapy Very Basic Example

查看:60
本文介绍了Scrapy 非常基本的例子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 Mac 上安装了 Python Scrapy,我试图遵循 v他们网站上的第一个示例.

Hi I have Python Scrapy installed on my mac and I was trying to follow the very first example on their web.

他们试图运行命令:

scrapy crawl mininova.org -o scraped_data.json -t json

我不太明白这是什么意思?看起来scrapy原来是一个单独的程序.我不认为他们有一个叫做 crawl 的命令.在示例中,它们有一段代码,即类 MininovaSpider 和 TorrentItem 的定义.我不知道这两个类应该去哪里,去同一个文件,这个python文件的名称是什么?

I don't quite understand what does this mean? looks like scrapy turns out to be a separate program. And I don't think they have a command called crawl. In the example, they have a paragraph of code, which is the definition of the class MininovaSpider and the TorrentItem. I don't know where these two classes should go to, go to the same file and what is the name of this python file?

推荐答案

通过 教程,而不是Scrapy 一目了然"网页.

You may have better luck looking through the tutorial first, as opposed to the "Scrapy at a glance" webpage.

本教程暗示 Scrapy 实际上是一个单独的程序.

The tutorial implies that Scrapy is, in fact, a separate program.

运行命令 scrapy startproject tutorial 将创建一个名为 tutorial 的文件夹,已经为您设置了几个文件.

Running the command scrapy startproject tutorial will create a folder called tutorial several files already set up for you.

例如,在我的例子中,模块/包 itemspipelinessettingsspider 有已添加到根包 tutorial .

For example, in my case, the modules/packages items, pipelines, settings and spiders have been added to the root package tutorial .

tutorial/
    scrapy.cfg
    tutorial/
        __init__.py
        items.py
        pipelines.py
        settings.py
        spiders/
            __init__.py
            ...

TorrentItem 类将放在 items.py 内,MininovaSpider 类将放在 spider 内> 文件夹.

The TorrentItem class would be placed inside items.py, and the MininovaSpider class would go inside the spiders folder.

一旦项目设置好,Scrapy 的命令行参数就显得相当简单了.它们采用以下形式:

Once the project is set up, the command-line parameters for Scrapy appear to be fairly straightforward. They take the form:

scrapy crawl <website-name> -o <output-file> -t <output-type>

或者,如果你想在没有创建项目目录的开销的情况下运行scrapy,你可以使用runspider 命令:

Alternatively, if you want to run scrapy without the overhead of creating a project directory, you can use the runspider command:

scrapy runspider my_spider.py

这篇关于Scrapy 非常基本的例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆