检查类型:如何检查某个东西是RDD还是DataFrame? [英] Check Type: How to check if something is a RDD or a DataFrame?

查看:77
本文介绍了检查类型:如何检查某个东西是RDD还是DataFrame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python,这是一个Spark RDD/DataFrame.

I'm using Python, and this is a Spark RDD / DataFrame.

我尝试了isinstance(thing, RDD),但未识别出RDD.

I tried isinstance(thing, RDD) but RDD wasn't recognized.

我需要这样做的原因:

我正在编写一个可以同时传入RDDDataFrame的函数,因此如果传入DataFrame,我将需要执行input.rdd来获取底层的RDD.

I'm writing a function where both RDD and DataFrame could be passed in, so I'll need to do input.rdd to get the underlying RDD if a DataFrame is passed in.

推荐答案

isinstance可以正常工作:

from pyspark.sql import DataFrame
from pyspark.rdd import RDD

def foo(x):
    if isinstance(x, RDD):
        return "RDD"
    if isinstance(x, DataFrame):
        return "DataFrame"

foo(sc.parallelize([]))
## 'RDD'
foo(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'

但是单次调度是一种更为优雅的方法:

but single dispatch is much more elegant approach:

from functools import singledispatch

@singledispatch
def bar(x):
    pass 

@bar.register(RDD)
def _(arg):
    return "RDD"

@bar.register(DataFrame)
def _(arg):
    return "DataFrame"

bar(sc.parallelize([]))
## 'RDD'

bar(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'

如果您不介意其他依赖项, multipledispatch 也是一个有趣的选择:

If you don't mind additional dependencies multipledispatch is also an interesting option:

from multipledispatch import dispatch

@dispatch(RDD)
def baz(x):
    return "RDD"

@dispatch(DataFrame)
def baz(x):
    return "DataFrame"

baz(sc.parallelize([]))
## 'RDD'

baz(sc.parallelize([("foo", 1)]).toDF())
## 'DataFrame'

最后,最Python化的方法是简单地检查一个接口:

Finally the most Pythonic approach is to simply check an interface:

def foobar(x):
    if hasattr(x, "rdd"):
        ## It is a DataFrame
    else:
        ## It (probably) is a RDD

这篇关于检查类型:如何检查某个东西是RDD还是DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆