从 Ruby 到 Python:爬虫 [英] Going from Ruby to Python : Crawlers

查看:51
本文介绍了从 Ruby 到 Python:爬虫的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

过去几天我开始学习python.我想知道在python中编写爬虫的等效方式.

I've started to learn python the past couple of days. I want to know the equivalent way of writing crawlers in python.

所以在 ruby​​ 中我使用:

so In ruby I use:

  1. nokogiri 用于抓取 html 并通过 css 标签获取内容
  2. Net::HTTPNet::HTTP::Get.new(uri.request_uri).body 用于从 url 获取 JSON 数据
  1. nokogiri for crawling html and getting content through css tags
  2. Net::HTTP and Net::HTTP::Get.new(uri.request_uri).body for getting JSON data from a url

python 中这些的等价物是什么?

what are equivalents of these in python?

推荐答案

好吧

主要是你必须将'scraper'/crawler、将从网络服务器下载文件/数据的python lib/程序/函数和将读取此数据并解释数据的解析器分开.就我而言,我不得不废弃并获取一些开放"但不适合下载/数据的政府信息.对于这个项目,我使用了scrapy[1].

Mainly you have to separate the 'scraper'/crawler the python lib/program/function that will download the files/data from the webserver and the Parser that will read this data and interpret the data. In my case I had to scrap and get some govt info that is 'open' but not download/data friendly. For this project I used scrapy[1].

主要是我设置了starter_urls",这是我的机器人将抓取/获取的网址,然后我使用解析器"函数来检索/解析这些数据.

Mainly I set the 'starter_urls' that are the urls my robot will crawl/get and after I use a function 'parser' to retrieve/parse this data.

为了解析/检索,您将需要一些 html,lxml 提取器,因为 90% 的数据都是这样.

For parsing/retrieving you are going to need some html,lxml extractor as the 90% of your data will be that.

现在关注您的问题:

用于数据抓取

  1. Scrapy
  2. 请求 [2]
  3. Urllib [3]

用于解析数据

  1. Scrapy/lxml 或 scrapy+other
  2. lxml[4]
  3. 美丽的汤[5]

请记住,抓取"和抓取不仅适用于网络,也适用于电子邮件.你可以在这里查看另一个问题 [6]

And please remember 'crawling' and scrapping is not only for web, emails too. you can check another question about that here [6]

[1] = http://scrapy.org/

[2] - http://docs.python-requests.org/en/最新/

[3] - http://docs.python.org/library/urllib.html

[4] - http://lxml.de/

[5] - http://www.crummy.com/software/BeautifulSoup/

[6] - Python 阅读我的Outlook电子邮件邮箱并解析邮件

这篇关于从 Ruby 到 Python:爬虫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆