如何实现Functor [数据集] [英] How to implement Functor[Dataset]

查看：74 发布时间：2020/6/26 18:40:13 scala apache-spark scala-cats scala-implicits apache-spark-encoders

本文介绍了如何实现Functor [数据集]的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在如何创建Functor[Dataset]的实例上苦苦挣扎...问题是，当您map从A到B时，Encoder[B]必须在隐式范围内，但我不是确定该怎么做.

I am struggling on how to create an instance of Functor[Dataset]... the problem is that when you map from A to B the Encoder[B] must be in the implicit scope but I am not sure how to do it.

implicit val datasetFunctor: Functor[Dataset] = new Functor[Dataset] {
    override def map[A, B](fa: Dataset[A])(f: A => B): Dataset[B] = fa.map(f)
  }

当然，此代码会引发编译错误，因为Encoder[B]不可用，但我无法将Encoder[B]添加为隐式参数，因为它会更改map方法的签名，我该如何解决呢?

Of course this code is throwing a compilation error since Encoder[B] is not available but I can't add Encoder[B] as an implicit parameter because it would change the map method signature, how can I solve this?

推荐答案

您无法立即申请f，因为您缺少Encoder.唯一明显的直接解决方案是:使用cats并重新实现所有接口，并添加一个隐式Encoder参数.我看不到有什么方法可以直接为Dataset 实现Functor.

You cannot apply f right away, because you are missing the Encoder. The only obvious direct solution would be: take cats and re-implement all the interfaces, adding an implict Encoder argument. I don't see any way to implement a Functor for Dataset directly.

但是，也许以下替代解决方案就足够了. 您可以做的是为数据集创建包装器，该包装器具有map方法而没有隐式Encoder，但是还具有方法toDataset，该方法最后需要Encoder.

However maybe the following substitute solution is good enough. What you could do is to create a wrapper for the dataset, which has a map method without the implicit Encoder, but additionally has a method toDataset, which needs the Encoder in the very end.

对于此包装器，您可以应用与所谓的Coyoneda -construction(或Coyo?)相似的结构.他们今天称它为什么?我不知道...).本质上，这是为任意类型的构造函数实现自由函子"的方法.

For this wrapper, you could apply a construction which is very similar to the so-called Coyoneda-construction (or Coyo? What do they call it today? I don't know...). It essentially is a way to implement a "free functor" for an arbitrary type constructor.

这是一个草图(用猫1.0.1编译，用假人替换了Spark特征):

Here is a sketch (it compiles with cats 1.0.1, replaced Spark traits by dummies):

import scala.language.higherKinds
import cats.Functor

/** Dummy for spark-Encoder */
trait Encoder[X]

/** Dummy for spark-Dataset */
trait Dataset[X] {
  def map[Y](f: X => Y)(implicit enc: Encoder[Y]): Dataset[Y]
}

/** Coyoneda-esque wrapper for `Dataset` 
  * that simply stashes all arguments to `map` away
  * until a concrete `Encoder` is supplied during the
  * application of `toDataset`.
  *
  * Essentially: the wrapped original dataset + concatenated
  * list of functions which have been passed to `map`.
  */
abstract class MappedDataset[X] private () { self =>
  type B
  val base: Dataset[B]
  val path: B => X
  def toDataset(implicit enc: Encoder[X]): Dataset[X] = base map path

  def map[Y](f: X => Y): MappedDataset[Y] = new MappedDataset[Y] {
    type B = self.B
    val base = self.base
    val path: B => Y = f compose self.path
  }
}

object MappedDataset {
  /** Constructor for MappedDatasets.
    * 
    * Wraps a `Dataset` into a `MappedDataset` 
    */
  def apply[X](ds: Dataset[X]): MappedDataset[X] = new MappedDataset[X] {
    type B = X
    val base = ds
    val path = identity
  }

}        

object MappedDatasetFunctor extends Functor[MappedDataset] {
  /** Functorial `map` */
  def map[A, B](da: MappedDataset[A])(f: A => B): MappedDataset[B] = da map f
}

现在，您可以将数据集ds包装到MappedDataset(ds)中，然后根据需要使用隐式MappedDatasetFunctor将其保存在map中，然后在最后调用toDataset，您可以在其中提供最终结果的具体Encoder.

Now you can wrap a dataset ds into a MappedDataset(ds), then map it using the implicit MappedDatasetFunctor as long as you want, and then call toDataset in the very end, there you can supply a concrete Encoder for the final result.

请注意，这会将map内部的所有功能组合到单个spark阶段:由于所有中间步骤的Encoder都丢失，因此它将无法保存中间结果.

Note that this will combine all functions inside map into a single spark stage: it won't be able to save the intermediate results, because the Encoders for all intermediate steps are missing.

我还没有学习cats，所以我不能保证这是最惯用的解决方案.库中可能已经有Coyoneda式的东西.

I'm not quite there yet with studying cats, I cannot guarantee that this is the most idiomatic solution. Probably there is something Coyoneda-esque already in the library.

编辑:存在 Coyoneda ，但需要自然转换F ~> G到函子G.不幸的是，我们没有Dataset的Functor(首先是问题所在).我上面的实现方式是:代替Functor[G]，它需要在固定的X处(不存在)自然变换的单晶态(这就是Encoder[X] ).

There is Coyoneda in the cats library, but it requires a natural transformation F ~> G to a functor G. Unfortunately, we don't have a Functor for Dataset (that was the problem in the first place). What my implementation above does is: instead of a Functor[G], it requires a single morphism of the (non-existent) natural transformation at a fixed X (this is what the Encoder[X] is).

这篇关于如何实现Functor [数据集]的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何实现Functor [数据集] [英] How to implement Functor[Dataset]

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何实现Functor [数据集] [英] How to implement Functor[Dataset]

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭