从 Ruby 到 Python:爬虫 [英] Going from Ruby to Python : Crawlers

查看：51 发布时间：2021/7/11 19:59:24 python ruby web-crawler

本文介绍了从 Ruby 到 Python:爬虫的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

过去几天我开始学习python.我想知道在python中编写爬虫的等效方式.

I've started to learn python the past couple of days. I want to know the equivalent way of writing crawlers in python.

所以在 ruby 中我使用:

so In ruby I use:

nokogiri 用于抓取 html 并通过 css 标签获取内容
Net::HTTP 和 Net::HTTP::Get.new(uri.request_uri).body 用于从 url 获取 JSON 数据

nokogiri for crawling html and getting content through css tags
Net::HTTP and Net::HTTP::Get.new(uri.request_uri).body for getting JSON data from a url

python 中这些的等价物是什么?

what are equivalents of these in python?

推荐答案

好吧

主要是你必须将'scraper'/crawler、将从网络服务器下载文件/数据的python lib/程序/函数和将读取此数据并解释数据的解析器分开.就我而言，我不得不废弃并获取一些开放"但不适合下载/数据的政府信息.对于这个项目，我使用了scrapy[1].

Mainly you have to separate the 'scraper'/crawler the python lib/program/function that will download the files/data from the webserver and the Parser that will read this data and interpret the data. In my case I had to scrap and get some govt info that is 'open' but not download/data friendly. For this project I used scrapy[1].

主要是我设置了starter_urls"，这是我的机器人将抓取/获取的网址，然后我使用解析器"函数来检索/解析这些数据.

Mainly I set the 'starter_urls' that are the urls my robot will crawl/get and after I use a function 'parser' to retrieve/parse this data.

为了解析/检索，您将需要一些 html,lxml 提取器，因为 90% 的数据都是这样.

For parsing/retrieving you are going to need some html,lxml extractor as the 90% of your data will be that.

现在关注您的问题:

用于数据抓取

Scrapy
请求 [2]
Urllib [3]

用于解析数据

Scrapy/lxml 或 scrapy+other
lxml[4]
美丽的汤[5]

请记住，抓取"和抓取不仅适用于网络，也适用于电子邮件.你可以在这里查看另一个问题 [6]

And please remember 'crawling' and scrapping is not only for web, emails too. you can check another question about that here [6]

[1] = http://scrapy.org/

[2] - http://docs.python-requests.org/en/最新/

[3] - http://docs.python.org/library/urllib.html

[4] - http://lxml.de/

[5] - http://www.crummy.com/software/BeautifulSoup/

[6] - Python 阅读我的Outlook电子邮件邮箱并解析邮件

这篇关于从 Ruby 到 Python:爬虫的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从 Ruby 到 Python:爬虫 [英] Going from Ruby to Python : Crawlers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从 Ruby 到 Python:爬虫 [英] Going from Ruby to Python : Crawlers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭