同时使用SparkR和Sparklyr [英] Using SparkR and Sparklyr simultaneously

查看:137
本文介绍了同时使用SparkR和Sparklyr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解,这两个软件包为Apache Spark提供了相似但大部分不同的包装器功能.Sparklyr较新,并且在功能范围上仍需要增长.因此,我认为当前需要使用这两个软件包来获得全部功能.

As far as I understood, those two packages provide similar but mostly different wrapper functions for Apache Spark. Sparklyr is newer and still needs to grow in the scope of functionality. I therefore think that one currently needs to use both packages to get the full scope of functionality.

由于这两个软件包本质上都包装了对Scala类的Java实例的引用,所以我猜应该可以并行使用这些软件包.但是实际上有可能吗?您的最佳做法是什么?

As both packages essentially wrap references to Java instances of scala classes, it should be possible to use the packages in parallel, I guess. But is it actually possible? What are your best practices?

推荐答案

这两个软件包使用不同的机制,并且并不是为实现互操作性而设计的.它们的内部设计采用不同的方式,并且不会以相同的方式公开JVM后端.

These two packages use different mechanisms and are not designed for interoperability. Their internals are designed in different ways, and don't expose JVM backend in the same manner.

虽然人们会想到一些解决方案,该解决方案允许与持久性元存储进行部分数据共享(想到使用全局临时视图),但应用程序却相当有限.

While one could think of some solution that would allow for partial data sharing (using global temporary views comes to mind) with persistent metastore, it would have rather limited applications.

如果同时需要两者,我建议您使用永久性存储将管道分成多个步骤,并在这些步骤之间传递数据.

If you need both I'd recommend separating your pipeline into multiple steps, and passing data between these, using persistent storage.

这篇关于同时使用SparkR和Sparklyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆