将两列传递给Scala中的udf? [英] Passing two columns to a udf in scala?

查看:54
本文介绍了将两列传递给Scala中的udf?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两列的数据框,一列是数据,另一列是该数据字段中的字符数.

I have a dataframe containing two columns,one is data and the other column is character count in that data field.

Data    Count
Hello   5
How     3
World   5

我想根据count列中的值更改列数据的值.如何做到这一点?我用udf尝试过这个:

I want to change value of column data based on the value in count column. How can this be achieved? I tried this using an udf :

invalidrecords.withColumn("value",appendDelimiterError(invalidrecords("value"),invalidrecords("a_cnt")))

这似乎失败了,这是正确的方法吗?

This seems to fail, is this the correct way to do it?

推荐答案

这是一种简单的方法

首先,您创建一个 dataframe

import sqlContext.implicits._
val invalidrecords = Seq(
  ("Hello", 5),
  ("How", 3),
  ("World", 5)
).toDF("Data", "Count")

您应该拥有

+-----+-----+
|Data |Count|
+-----+-----+
|Hello|5    |
|How  |3    |
|World|5    |
+-----+-----+

然后将udf函数定义为

Then you define udf function as

import org.apache.spark.sql.functions._
def appendDelimiterError = udf((data: String, count: Int) => "value with error" )

然后使用 withColumn 作为

invalidrecords.withColumn("value",appendDelimiterError(invalidrecords("Data"),invalidrecords("Count"))).show(false)

您应该将输出显示为

+-----+-----+----------------+
|Data |Count|value           |
+-----+-----+----------------+
|Hello|5    |value with error|
|How  |3    |value with error|
|World|5    |value with error|
+-----+-----+----------------+

您可以编写逻辑,而不用从 udf 函数返回字符串

You can write your logic instead of returning a string from udf function

已编辑

在下面的注释中满足您的要求将要求您更改udf函数和withColumn如下

Answering your requirements in the comment below would require you to change the udf function and withColumn as below

def appendDelimiterError = udf((data: String, count: Int) => {
  if(count < 5) s"convert value to ${data} - error"
  else data
} )

invalidrecords.withColumn("Data",appendDelimiterError(invalidrecords("Data"),invalidrecords("Count"))).show(false)

您应该将输出显示为

+----------------------------+-----+
|Data                        |Count|
+----------------------------+-----+
|Hello                       |5    |
|convert value to How - error|3    |
|World                       |5    |
+----------------------------+-----+

这篇关于将两列传递给Scala中的udf?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆