如何在Dataframe上使用Spark中DataFrameNaFunctions类提供的函数? [英] How to use functions provide by DataFrameNaFunctions class in Spark, on a Dataframe?

查看:224
本文介绍了如何在Dataframe上使用Spark中DataFrameNaFunctions类提供的函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我想在该数据框上使用 org.apache.spark.sql.DataFrameNaFunctions replace()函数之一.

I have a dataframe and I want to use one of the replace() function of org.apache.spark.sql.DataFrameNaFunctions on that dataframe.

问题:我没有在数据框实例的智能(建议)中得到这些方法.我显式导入了该类.

Problem: I don't get these methods in intelligence (suggestions) with dataframe's instance. I imported that class explicitly.

我找不到任何可以使我示范如何使用这些功能或如何将数据帧转换为DataFrameNaFunctions类型的东西的东西.

I am not able to find any stuff which can give me some demonstration of how to use these functions or how to cast dataframe to type of DataFrameNaFunctions.

我尝试使用asInstanceof[]方法对其进行转换,但会引发异常.

I tried to cast it using asInstanceof[] method but it throws exception.

推荐答案

这可能有点令人困惑,但是说实话,这非常简单.这是一个小例子:

This can be a bit confusing but it's quite straightforward to be honest. Here is an small example :

scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("na_test.csv")
// df: org.apache.spark.sql.DataFrame = [name: string, age: int]

scala> df.show()
// +-----+----+
// | name| age|
// +-----+----+
// |alice|  35|
// |  bob|null|
// |     |  24|
// +-----+----+

scala> df.na.fill(10.0,Seq("age"))
// res4: org.apache.spark.sql.DataFrame = [name: string, age: int]

// scala> df.na.fill(10.0,Seq("age")).show
// +-----+---+
// | name|age|
// +-----+---+
// |alice| 35|
// |  bob| 10|
// |     | 24|
// +-----+---+

scala> df.na.replace("age", Map(35 -> 61,24 -> 12))).show()
// +-----+----+
// | name| age|
// +-----+----+
// |alice|  61|
// |  bob|null|
// |     |  12|
// +-----+----+

要访问org.apache.spark.sql.DataFrameNaFunctions,您可以致电.na.

这篇关于如何在Dataframe上使用Spark中DataFrameNaFunctions类提供的函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆