如何在 CASE 语句中使用数组类型列值 [英] How to use array type column value in CASE statement

查看：32 发布时间：2021/11/14 23:27:03 apache-spark apache-spark-sql pyspark-sql

本文介绍了如何在 CASE 语句中使用数组类型列值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含两列的数据框，listA 存储为 Seq[String] 和 valB 存储为 String>.我想创建第三列 valC，它将是 Int 类型，其值为
如果 valB 存在于 listA 中则为 1 否则为 0

I have a dataframe with two columns, listA stored as Seq[String] and valB stored as String. I want to create a third column valC, which will be of Int type and its value is
iff valB is present in listA then 1 otherwise 0

我尝试执行以下操作:

val dfWithAdditionalColumn = df.withColumn("valC", when($"listA".contains($"valB"), 1).otherwise(0))

但 Spark 未能执行此操作并给出以下错误:

But Spark failed to execute this and gave the following error:

cannot resolve 'contains('listA', 'valB')' due to data type mismatch: argument 1 requires string type, however, 'listA' is of array type.;

如何在 CASE 语句中使用数组类型的列值?

How do I use a array type column value in CASE statement?

谢谢，开发

推荐答案

您可以编写一个简单的 udf 来检查元素是否存在于数组中:

You can write a simple udf that will check if the element is present in the array :

val arrayContains = udf( (col1: Int, col2: Seq[Int]) => if(col2.contains(col1) ) 1 else 0 )

然后只需调用它并以正确的顺序传递必要的列:

And then just call it and pass the necessary columns in the correct order :

df.withColumn("hasAInB", arrayContains($"a", $"b" ) ).show

+---+---------+-------+
|  a|        b|hasAInB|
+---+---------+-------+
|  1|   [1, 2]|      1|
|  2|[2, 3, 4]|      1|
|  3|   [1, 4]|      0|
+---+---------+-------+

这篇关于如何在 CASE 语句中使用数组类型列值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 CASE 语句中使用数组类型列值 [英] How to use array type column value in CASE statement

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 CASE 语句中使用数组类型列值 [英] How to use array type column value in CASE statement

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭