如何实现Functor [数据集] [英] How to implement Functor[Dataset]
问题描述
我在如何创建Functor[Dataset]
的实例上苦苦挣扎...问题是,当您map
从A
到B
时,Encoder[B]
必须在隐式范围内,但我不是确定该怎么做.
I am struggling on how to create an instance of Functor[Dataset]
... the problem is that when you map
from A
to B
the Encoder[B]
must be in the implicit scope but I am not sure how to do it.
implicit val datasetFunctor: Functor[Dataset] = new Functor[Dataset] {
override def map[A, B](fa: Dataset[A])(f: A => B): Dataset[B] = fa.map(f)
}
当然,此代码会引发编译错误,因为Encoder[B]
不可用,但我无法将Encoder[B]
添加为隐式参数,因为它会更改map方法的签名,我该如何解决呢?
Of course this code is throwing a compilation error since Encoder[B]
is not available but I can't add Encoder[B]
as an implicit parameter because it would change the map method signature, how can I solve this?
推荐答案
您无法立即申请f
,因为您缺少Encoder
.唯一明显的直接解决方案是:使用cats
并重新实现所有接口,并添加一个隐式Encoder
参数.我看不到有什么方法可以直接为Dataset
实现Functor
.
You cannot apply f
right away, because you are missing the Encoder
. The only obvious direct solution would be: take cats
and re-implement all the interfaces, adding an implict Encoder
argument. I don't see any way to implement a Functor
for Dataset
directly.
但是,也许以下替代解决方案就足够了.
您可以做的是为数据集创建包装器,该包装器具有map
方法而没有隐式Encoder
,但是还具有方法toDataset
,该方法最后需要Encoder
.
However maybe the following substitute solution is good enough.
What you could do is to create a wrapper for the dataset, which has a map
method without the implicit Encoder
, but additionally has a method toDataset
, which needs the Encoder
in the very end.
对于此包装器,您可以应用与所谓的Coyoneda
-construction(或Coyo
?)相似的结构.他们今天称它为什么?我不知道...).本质上,这是为任意类型的构造函数实现自由函子"的方法.
For this wrapper, you could apply a construction which is very similar to the so-called Coyoneda
-construction (or Coyo
? What do they call it today? I don't know...). It essentially is a way to implement a "free functor" for an arbitrary type constructor.
这是一个草图(用猫1.0.1编译,用假人替换了Spark
特征):
Here is a sketch (it compiles with cats 1.0.1, replaced Spark
traits by dummies):
import scala.language.higherKinds
import cats.Functor
/** Dummy for spark-Encoder */
trait Encoder[X]
/** Dummy for spark-Dataset */
trait Dataset[X] {
def map[Y](f: X => Y)(implicit enc: Encoder[Y]): Dataset[Y]
}
/** Coyoneda-esque wrapper for `Dataset`
* that simply stashes all arguments to `map` away
* until a concrete `Encoder` is supplied during the
* application of `toDataset`.
*
* Essentially: the wrapped original dataset + concatenated
* list of functions which have been passed to `map`.
*/
abstract class MappedDataset[X] private () { self =>
type B
val base: Dataset[B]
val path: B => X
def toDataset(implicit enc: Encoder[X]): Dataset[X] = base map path
def map[Y](f: X => Y): MappedDataset[Y] = new MappedDataset[Y] {
type B = self.B
val base = self.base
val path: B => Y = f compose self.path
}
}
object MappedDataset {
/** Constructor for MappedDatasets.
*
* Wraps a `Dataset` into a `MappedDataset`
*/
def apply[X](ds: Dataset[X]): MappedDataset[X] = new MappedDataset[X] {
type B = X
val base = ds
val path = identity
}
}
object MappedDatasetFunctor extends Functor[MappedDataset] {
/** Functorial `map` */
def map[A, B](da: MappedDataset[A])(f: A => B): MappedDataset[B] = da map f
}
现在,您可以将数据集ds
包装到MappedDataset(ds)
中,然后根据需要使用隐式MappedDatasetFunctor
将其保存在map
中,然后在最后调用toDataset
,您可以在其中提供最终结果的具体Encoder
.
Now you can wrap a dataset ds
into a MappedDataset(ds)
, then map
it using the implicit MappedDatasetFunctor
as long as you want, and then call toDataset
in the very end, there you can supply a concrete Encoder
for the final result.
请注意,这会将map
内部的所有功能组合到单个spark阶段:由于所有中间步骤的Encoder
都丢失,因此它将无法保存中间结果.
Note that this will combine all functions inside map
into a single spark stage: it won't be able to save the intermediate results, because the Encoder
s for all intermediate steps are missing.
我还没有学习cats
,所以我不能保证这是最惯用的解决方案.库中可能已经有Coyoneda
式的东西.
I'm not quite there yet with studying cats
, I cannot guarantee that this is the most idiomatic solution. Probably there is something Coyoneda
-esque already in the library.
编辑:存在 Coyoneda ,但需要自然转换F ~> G
到函子G
.不幸的是,我们没有Dataset
的Functor
(首先是问题所在).我上面的实现方式是:代替Functor[G]
,它需要在固定的X
处(不存在)自然变换的单晶态(这就是Encoder[X]
).
There is Coyoneda in the cats library, but it requires a natural transformation F ~> G
to a functor G
. Unfortunately, we don't have a Functor
for Dataset
(that was the problem in the first place). What my implementation above does is: instead of a Functor[G]
, it requires a single morphism of the (non-existent) natural transformation at a fixed X
(this is what the Encoder[X]
is).
这篇关于如何实现Functor [数据集]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!