Scrapy 非常基本的例子 [英] Scrapy Very Basic Example
问题描述
我的 Mac 上安装了 Python Scrapy,我试图遵循 v他们网站上的第一个示例.
Hi I have Python Scrapy installed on my mac and I was trying to follow the very first example on their web.
他们试图运行命令:
scrapy crawl mininova.org -o scraped_data.json -t json
我不太明白这是什么意思?看起来scrapy原来是一个单独的程序.我不认为他们有一个叫做 crawl 的命令.在示例中,它们有一段代码,即类 MininovaSpider 和 TorrentItem 的定义.我不知道这两个类应该去哪里,去同一个文件,这个python文件的名称是什么?
I don't quite understand what does this mean? looks like scrapy turns out to be a separate program. And I don't think they have a command called crawl. In the example, they have a paragraph of code, which is the definition of the class MininovaSpider and the TorrentItem. I don't know where these two classes should go to, go to the same file and what is the name of this python file?
推荐答案
通过 教程,而不是Scrapy 一目了然"网页.
You may have better luck looking through the tutorial first, as opposed to the "Scrapy at a glance" webpage.
本教程暗示 Scrapy 实际上是一个单独的程序.
The tutorial implies that Scrapy is, in fact, a separate program.
运行命令 scrapy startproject tutorial
将创建一个名为 tutorial
的文件夹,已经为您设置了几个文件.
Running the command scrapy startproject tutorial
will create a folder called tutorial
several files already set up for you.
例如,在我的例子中,模块/包 items
、pipelines
、settings
和 spider
有已添加到根包 tutorial
.
For example, in my case, the modules/packages items
, pipelines
, settings
and spiders
have been added to the root package tutorial
.
tutorial/
scrapy.cfg
tutorial/
__init__.py
items.py
pipelines.py
settings.py
spiders/
__init__.py
...
TorrentItem
类将放在 items.py
内,MininovaSpider
类将放在 spider
内> 文件夹.
The TorrentItem
class would be placed inside items.py
, and the MininovaSpider
class would go inside the spiders
folder.
一旦项目设置好,Scrapy 的命令行参数就显得相当简单了.它们采用以下形式:
Once the project is set up, the command-line parameters for Scrapy appear to be fairly straightforward. They take the form:
scrapy crawl <website-name> -o <output-file> -t <output-type>
或者,如果你想在没有创建项目目录的开销的情况下运行scrapy,你可以使用runspider 命令:
Alternatively, if you want to run scrapy without the overhead of creating a project directory, you can use the runspider command:
scrapy runspider my_spider.py
这篇关于Scrapy 非常基本的例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!