检查类型:如何检查某个东西是RDD还是DataFrame? [英] Check Type: How to check if something is a RDD or a DataFrame?
问题描述
我正在使用Python,这是一个Spark RDD/DataFrame.
I'm using Python, and this is a Spark RDD / DataFrame.
我尝试了isinstance(thing, RDD)
,但未识别出RDD
.
I tried isinstance(thing, RDD)
but RDD
wasn't recognized.
我需要这样做的原因:
我正在编写一个可以同时传入RDD
和DataFrame
的函数,因此如果传入DataFrame,我将需要执行input.rdd
来获取底层的RDD.
I'm writing a function where both RDD
and DataFrame
could be passed in, so I'll need to do input.rdd
to get the underlying RDD if a DataFrame is passed in.
推荐答案
isinstance
可以正常工作:
from pyspark.sql import DataFrame
from pyspark.rdd import RDD
def foo(x):
if isinstance(x, RDD):
return "RDD"
if isinstance(x, DataFrame):
return "DataFrame"
foo(sc.parallelize([]))
## 'RDD'
foo(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'
但是单次调度是一种更为优雅的方法:
but single dispatch is much more elegant approach:
from functools import singledispatch
@singledispatch
def bar(x):
pass
@bar.register(RDD)
def _(arg):
return "RDD"
@bar.register(DataFrame)
def _(arg):
return "DataFrame"
bar(sc.parallelize([]))
## 'RDD'
bar(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'
如果您不介意其他依赖项, multipledispatch
也是一个有趣的选择:>
If you don't mind additional dependencies multipledispatch
is also an interesting option:
from multipledispatch import dispatch
@dispatch(RDD)
def baz(x):
return "RDD"
@dispatch(DataFrame)
def baz(x):
return "DataFrame"
baz(sc.parallelize([]))
## 'RDD'
baz(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'
最后,最Python化的方法是简单地检查一个接口:
Finally the most Pythonic approach is to simply check an interface:
def foobar(x):
if hasattr(x, "rdd"):
## It is a DataFrame
else:
## It (probably) is a RDD
这篇关于检查类型:如何检查某个东西是RDD还是DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!