将两列传递给Scala中的udf? [英] Passing two columns to a udf in scala?
问题描述
我有一个包含两列的数据框,一列是数据,另一列是该数据字段中的字符数.
I have a dataframe containing two columns,one is data and the other column is character count in that data field.
Data Count
Hello 5
How 3
World 5
我想根据count列中的值更改列数据的值.如何做到这一点?我用udf尝试过这个:
I want to change value of column data based on the value in count column. How can this be achieved? I tried this using an udf :
invalidrecords.withColumn("value",appendDelimiterError(invalidrecords("value"),invalidrecords("a_cnt")))
这似乎失败了,这是正确的方法吗?
This seems to fail, is this the correct way to do it?
推荐答案
这是一种简单的方法
首先,您创建一个 dataframe
import sqlContext.implicits._
val invalidrecords = Seq(
("Hello", 5),
("How", 3),
("World", 5)
).toDF("Data", "Count")
您应该拥有
+-----+-----+
|Data |Count|
+-----+-----+
|Hello|5 |
|How |3 |
|World|5 |
+-----+-----+
然后将udf函数定义为
Then you define udf function as
import org.apache.spark.sql.functions._
def appendDelimiterError = udf((data: String, count: Int) => "value with error" )
然后使用 withColumn
作为
invalidrecords.withColumn("value",appendDelimiterError(invalidrecords("Data"),invalidrecords("Count"))).show(false)
您应该将输出显示为
+-----+-----+----------------+
|Data |Count|value |
+-----+-----+----------------+
|Hello|5 |value with error|
|How |3 |value with error|
|World|5 |value with error|
+-----+-----+----------------+
您可以编写逻辑,而不用从 udf
函数返回字符串
You can write your logic instead of returning a string from udf
function
已编辑
在下面的注释中满足您的要求将要求您更改udf函数和withColumn如下
Answering your requirements in the comment below would require you to change the udf function and withColumn as below
def appendDelimiterError = udf((data: String, count: Int) => {
if(count < 5) s"convert value to ${data} - error"
else data
} )
invalidrecords.withColumn("Data",appendDelimiterError(invalidrecords("Data"),invalidrecords("Count"))).show(false)
您应该将输出显示为
+----------------------------+-----+
|Data |Count|
+----------------------------+-----+
|Hello |5 |
|convert value to How - error|3 |
|World |5 |
+----------------------------+-----+
这篇关于将两列传递给Scala中的udf?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!