在 DataFrame 上定义自定义方法的最佳方法是什么? [英] What is the best way to define custom methods on a DataFrame?

查看:22
本文介绍了在 DataFrame 上定义自定义方法的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 DataFrame 上定义自定义方法.什么是更好的方法?该解决方案应该是可扩展的,因为我打算定义大量自定义方法.

I need to define custom methods on DataFrame. What is the better way to do it? The solution should be scalable, as I intend to define a significant number of custom methods.

我目前的方法是用 DataFrame 作为参数创建一个类(比如 MyClass),定义我的自定义方法(比如 customMethod)并定义了一个将 DataFrame 转换为 MyClass 的隐式方法.

My current approach is to create a class (say MyClass) with DataFrame as parameter, define my custom method (say customMethod) in that and define an implicit method which converts DataFrame to MyClass.

implicit def dataFrametoMyClass(df: DataFrame): MyClass = new MyClass(df)

因此我可以调用:

dataFrame.customMethod()

这是正确的做法吗?欢迎提出建议.

Is this the correct way to do it? Open for suggestions.

推荐答案

你的方法就是要走的路(见 [1]).尽管我解决的方法略有不同,但方法保持相似:

Your way is the way to go (see [1]). Even though I solved it a little different, the approach stays similar:

object ExtraDataFrameOperations {
  object implicits {
    implicit def dFWithExtraOperations(df: DataFrame) = DFWithExtraOperations(df)
  }
}

case class DFWithExtraOperations(df: DataFrame) {
  def customMethod(param: String) : DataFrame = {
    // do something fancy with the df
    // or delegate to some implementation
    //
    // here, just as an illustrating example: do a select
    df.select( df(param) )
  }
}

用法

DataFrame 上使用新的 customMethod 方法:

Usage

To use the new customMethod method on a DataFrame:

import ExtraDataFrameOperations.implicits._
val df = ...
val otherDF = df.customMethod("hello")

可能性 2

除了使用隐式方法(见上文),您还可以使用隐式类:

Possibility 2

Instead of using an implicit method (see above), you can also use an implicit class:

object ExtraDataFrameOperations {
  implicit class DFWithExtraOperations(df : DataFrame) {
     def customMethod(param: String) : DataFrame = {
      // do something fancy with the df
      // or delegate to some implementation
      //
      // here, just as an illustrating example: do a select
      df.select( df(param) )
    }
  }
}

用法

import ExtraDataFrameOperations._
val df = ...
val otherDF = df.customMethod("hello")

备注

如果您想阻止额外的 import,请将 object ExtraDataFrameOperations 转换为 package object 和将其存储在包中名为 package.scala 的文件中.

Remark

In case you want to prevent the additional import, turn the object ExtraDataFrameOperations into an package object and store it in in a file called package.scala within your package.

[1] M. Odersky 的原始博客Pimp my library"可在 http://www.artima.com/weblogs/viewpost.jsp?thread=179766

[1] The original blog "Pimp my library" by M. Odersky is available at http://www.artima.com/weblogs/viewpost.jsp?thread=179766

这篇关于在 DataFrame 上定义自定义方法的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆