在 DataFrame 上定义自定义方法的最佳方法是什么? [英] What is the best way to define custom methods on a DataFrame?
问题描述
我需要在 DataFrame 上定义自定义方法.什么是更好的方法?该解决方案应该是可扩展的,因为我打算定义大量自定义方法.
I need to define custom methods on DataFrame. What is the better way to do it? The solution should be scalable, as I intend to define a significant number of custom methods.
我目前的方法是用 DataFrame
作为参数创建一个类(比如 MyClass
),定义我的自定义方法(比如 customMethod
)并定义了一个将 DataFrame
转换为 MyClass
的隐式方法.
My current approach is to create a class (say MyClass
) with DataFrame
as parameter, define my custom method (say customMethod
) in that and define an implicit method which converts DataFrame
to MyClass
.
implicit def dataFrametoMyClass(df: DataFrame): MyClass = new MyClass(df)
因此我可以调用:
dataFrame.customMethod()
这是正确的做法吗?欢迎提出建议.
Is this the correct way to do it? Open for suggestions.
推荐答案
你的方法就是要走的路(见 [1]).尽管我解决的方法略有不同,但方法保持相似:
Your way is the way to go (see [1]). Even though I solved it a little different, the approach stays similar:
object ExtraDataFrameOperations {
object implicits {
implicit def dFWithExtraOperations(df: DataFrame) = DFWithExtraOperations(df)
}
}
case class DFWithExtraOperations(df: DataFrame) {
def customMethod(param: String) : DataFrame = {
// do something fancy with the df
// or delegate to some implementation
//
// here, just as an illustrating example: do a select
df.select( df(param) )
}
}
用法
在 DataFrame
上使用新的 customMethod
方法:
Usage
To use the new customMethod
method on a DataFrame
:
import ExtraDataFrameOperations.implicits._
val df = ...
val otherDF = df.customMethod("hello")
可能性 2
除了使用隐式方法
(见上文),您还可以使用隐式类
:
Possibility 2
Instead of using an implicit method
(see above), you can also use an implicit class
:
object ExtraDataFrameOperations {
implicit class DFWithExtraOperations(df : DataFrame) {
def customMethod(param: String) : DataFrame = {
// do something fancy with the df
// or delegate to some implementation
//
// here, just as an illustrating example: do a select
df.select( df(param) )
}
}
}
用法
import ExtraDataFrameOperations._
val df = ...
val otherDF = df.customMethod("hello")
备注
如果您想阻止额外的 import
,请将 object
ExtraDataFrameOperations
转换为 package object
和将其存储在包中名为 package.scala
的文件中.
Remark
In case you want to prevent the additional import
, turn the object
ExtraDataFrameOperations
into an package object
and store it in in a file called package.scala
within your package.
[1] M. Odersky 的原始博客Pimp my library"可在 http://www.artima.com/weblogs/viewpost.jsp?thread=179766
[1] The original blog "Pimp my library" by M. Odersky is available at http://www.artima.com/weblogs/viewpost.jsp?thread=179766
这篇关于在 DataFrame 上定义自定义方法的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!