SparkR和Sparklyr [英] SparkR vs sparklyr

查看:103
本文介绍了SparkR和Sparklyr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人对SparkR vs sparklyr的优缺点有一个总体了解吗? Google并未取得令人满意的结果,而且两者看起来都非常相似.尝试一下,SparkR看起来更加麻烦,而Sparklyr非常简单(既要安装也要使用,尤其是对于dplyr输入). sparklyr只能用于并行运行dplyr函数还是正常" R代码?

Does someone have an overview with respect to advantages/disadvantages of SparkR vs sparklyr? Google does not yield any satisfactory results and both seem fairly similar. Trying both out, SparkR appears a lot more cumbersome, whereas sparklyr is pretty straight forward (both to install but also to use, especially with the dplyr inputs). Can sparklyr only be used to run dplyr functions in parallel or also "normal" R-Code?

最佳

推荐答案

SparkR的最大优点是可以在用R编写的Spark任意用户定义函数上运行:

The biggest advantage of SparkR is the ability to run on Spark arbitrary user-defined functions written in R:

https://spark.apache. org/docs/2.0.1/sparkr.html#applying-user-defined-function

由于sparklyr将R转换为SQL,因此您只能在mutate语句中使用很小的一组函数:

Since sparklyr translates R to SQL, you can only use very small set of functions in mutate statements:

http://spark.rstudio.com/dplyr.html#sql_translation

扩展程序( http://spark.rstudio.com/extensions有所缓解). html#wrapper_functions ).

除此之外,sparklyr是赢家(我认为).除了使用熟悉的dplyr函数的明显优势之外,sparklyr还具有用于MLlib的更全面的API( http://spark.rstudio.com/mllib.html )和上述扩展名.

Other than that, sparklyr is a winner (in my opinion). Aside from the obvious advantage of using familiar dplyr functions, sparklyr has much more comprehensive API for MLlib (http://spark.rstudio.com/mllib.html) and the Extensions mentioned above.

这篇关于SparkR和Sparklyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆