将 web 刮板、scrapy 0.24 移植到 python 3.或者使用更好的东西 [英] port web scraper, scrapy 0.24, to python 3. or use something better

查看:21
本文介绍了将 web 刮板、scrapy 0.24 移植到 python 3.或者使用更好的东西的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 s废话y 来制作网络抓取工具,但是因为它使用 Python2,所以我遇到了很多问题.是否可以同时对 tarball 中的所有文件运行 2to3 命令?这会导致不可预见的错误吗?是否有替代的网络抓取工具框架,它可能会被推荐来替代,它是更新的、功能更多的?

I'm trying to use scrapy to make a web scraper but I'm running into many problems since it uses Python2. is it possible to run the 2to3 command on all the files in the tarball simultaneously? Would that cause unforseen errors? Is there an alternative web scraper framework which is more up to date, more functional that might be recommended in stead?

我这么说是因为最近似乎没有太多关于运行 0.24 版scrapy 固有问题的活动,即它是用python 2 编写的.

I say that because there doesn't seem to be much recent activity on forms on the problems inherent with running version 0.24 of scrapy, i.e. the fact that it's written in python 2.

如果scrapy 是最好的选择,而移植是个坏主意,那么在我的面向python3 的机器上运行它的最佳方法是什么?仅使用 python 2 或我可以在配置文件或其他内容中更改的内容运行它的命令.

If scrapy is the best choice, and porting is a bad idea, what's the best way to run this on my python3 oriented machine? a command to run it only with python 2 or something i can change in a config file or whatnot.

更新

如果你有这样的问题,你需要做的是:

If you have such problems what you need to do is:

只需使用 python2 运行 setup.py 脚本,即

simply run the setup.py script with python2, i.e.,

python2 setup.py install

然后你就可以开始了,之后就可以了.

and you're good to go, after that it'll work.

^如@alecxe所示

^as indicated by @alecxe

推荐答案

Scrapy 移植到 Python 3 的问题在于 Scrapy内置于twisted事件驱动的框架,目前尚不存在.

The problem with porting Scrapy to Python 3 is that Scrapy is built-in on top of the twisted event-driven framework, which currently is not yet there.

Python 3 上没有像 Scrapy 那样庞大而成熟的网络抓取框架.尽管如此,pyspider 看起来很有前途,但有点不同,请参阅:

There is no web-scraping framework as big and mature as Scrapy on Python 3. Though, pyspider looks promising, but it is a bit different, see:

此外,还有其他与支持 Python 3 的网页抓取和 html 解析相关的库:

Also, there are other libraries related to web-scraping and html-parsing that support Python 3:

  • beautifulsoup4
  • lxml
  • requests
  • MechanicalSoup (built on top of requests and BeautifulSoup)
  • selenium

这篇关于将 web 刮板、scrapy 0.24 移植到 python 3.或者使用更好的东西的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆