使用Scala进行网页爬取 [英] Web Scraping with Scala

查看:179
本文介绍了使用Scala进行网页爬取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只是想知道是否有人知道利用Scala简洁语法的网络抓取库.到目前为止,我已经找到了 Chafe ,但是这似乎没有得到很好的记录和维护.我想知道是否有人在Scala中完成了抓取工作并获得了建议. (我试图集成到现有的Scala框架中,而不是使用用Python编写的抓取工具.)

Just wondering if anyone knows of a web-scraping library that takes advantage of Scala's succinct syntax. So far, I've found Chafe, but this seems poorly-documented and maintained. I'm wondering if anyone out there has done scraping with Scala and has advice. (I'm trying to integrate into an existing Scala framework rather than use a scraper written in, say, Python.)

推荐答案

首先,JVM中有大量HTML抓取库,您需要做的只是

First there is a plethora of HTML scraping libs in JVM all you need to do is pimp one of them (pimp my library pattern).

我使用的四个是:

  • HtmlUnit - Will emulate the browser and even run Javascript
  • Jericho - Preserves formatting and ideal if you want to edit the scraped HTML
  • NekoHtml
  • JSoup -- does not work with Scala. Might work

我用过硒,但从未刮过. 斯卡拉(Scala)包裹着硒.

I have used Selenium but never for scraping. Scala has a wrapper around selenium.

我建议在现有的一半Scala库中使用现有的Java库.

I would recommend pimping an existing Java library over some half baked Scala lib.

这篇关于使用Scala进行网页爬取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆