将 web 刮板、scrapy 0.24 移植到 python 3.或者使用更好的东西 [英] port web scraper, scrapy 0.24, to python 3. or use something better
问题描述
我正在尝试使用 s废话y 来制作网络抓取工具,但是因为它使用 Python2,所以我遇到了很多问题.是否可以同时对 tarball 中的所有文件运行 2to3 命令?这会导致不可预见的错误吗?是否有替代的网络抓取工具框架,它可能会被推荐来替代,它是更新的、功能更多的?
I'm trying to use scrapy to make a web scraper but I'm running into many problems since it uses Python2. is it possible to run the 2to3 command on all the files in the tarball simultaneously? Would that cause unforseen errors? Is there an alternative web scraper framework which is more up to date, more functional that might be recommended in stead?
我这么说是因为最近似乎没有太多关于运行 0.24 版scrapy 固有问题的活动,即它是用python 2 编写的.
I say that because there doesn't seem to be much recent activity on forms on the problems inherent with running version 0.24 of scrapy, i.e. the fact that it's written in python 2.
如果scrapy 是最好的选择,而移植是个坏主意,那么在我的面向python3 的机器上运行它的最佳方法是什么?仅使用 python 2 或我可以在配置文件或其他内容中更改的内容运行它的命令.
If scrapy is the best choice, and porting is a bad idea, what's the best way to run this on my python3 oriented machine? a command to run it only with python 2 or something i can change in a config file or whatnot.
更新
如果你有这样的问题,你需要做的是:
If you have such problems what you need to do is:
只需使用 python2
运行 setup.py
脚本,即
simply run the setup.py
script with python2
, i.e.,
python2 setup.py install
然后你就可以开始了,之后就可以了.
and you're good to go, after that it'll work.
^如@alecxe所示
^as indicated by @alecxe
推荐答案
将 Scrapy
移植到 Python 3 的问题在于 Scrapy
是 内置于twisted
事件驱动的框架,目前尚不存在.
The problem with porting Scrapy
to Python 3 is that Scrapy
is built-in on top of the twisted
event-driven framework, which currently is not yet there.
Python 3 上没有像 Scrapy
那样庞大而成熟的网络抓取框架.尽管如此,pyspider
看起来很有前途,但有点不同,请参阅:
There is no web-scraping framework as big and mature as Scrapy
on Python 3. Though, pyspider
looks promising, but it is a bit different, see:
此外,还有其他与支持 Python 3 的网页抓取和 html 解析相关的库:
Also, there are other libraries related to web-scraping and html-parsing that support Python 3:
beautifulsoup4
lxml
请求
MechanicalSoup
(建立在requests之上code> 和
BeautifulSoup
)selenium
beautifulsoup4
lxml
requests
MechanicalSoup
(built on top ofrequests
andBeautifulSoup
)selenium
这篇关于将 web 刮板、scrapy 0.24 移植到 python 3.或者使用更好的东西的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!