如何在pyspark中合并具有条件的两列? [英] how to merge two columns with a condition in pyspark?

查看：19 发布时间：2021/11/14 23:01:57 apache-spark pyspark apache-spark-sql pyspark-sql

本文介绍了如何在pyspark中合并具有条件的两列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我能够对值进行合并和排序，但无法确定值相等时不合并的条件

I was able to merge and sort the values but unable to figure out the condition not to merge if the values are equal

df = sqlContext.createDataFrame([("foo", "bar","too","aaa"), ("bar", "bar","aaa","foo")], ("k", "K" ,"v" ,"V"))
columns = df.columns

k = 0
for i in range(len(columns)):
    for j in range(i + 1, len(columns)):
       if columns[i].lower() == columns[j].lower(): 
        k = k+1
        df = (df.withColumn(columns[i]+str(k),concat(col(columns[i]),lit(","), col(columns[j]))))
        newdf = df.select( col("k"),split(col("c1"), ",\s*").alias("c1"))
        sortDf = newdf.select(newdf.k,sort_array(newdf.c1).alias('sorted_c1'))

下表中 k 列和 K 列仅合并 [foo,bar] 但不合并 [bar,bar]

In the below table for columns k and K only merge [foo,bar] but not [bar,bar]

输入:

+---+---+---+---+
|  k|  K|  v|  V|
+---+---+---+---+
|foo|bar|too|aaa|
|bar|bar|aaa|foo|
+---+---+---+---+

输出:

+---+---+---+---+-----------+
|  k|  K|Merged K |Merged V |
+---+---+-------------------+
|foo|bar|[foo,bar] |[too,aaa]
|bar|bar|bar       |[aaa,foo]
+---+---+---+------+--------+

推荐答案

尝试:

from pyspark.sql.functions import udf

def merge(*c):
    merged = sorted(set(c))
    if len(merged) == 1:
        return merged[0]
    else:
        return "[{0}]".format(",".join(merged))

merge_udf = udf(merge)

df = sqlContext.createDataFrame([("foo", "bar","too","aaa"), ("bar", "bar","aaa","foo")], ("k1", "k2" ,"v1" ,"v2"))

df.select(merge_udf("k1", "k2"), merge_udf("v1", "v2"))

这篇关于如何在pyspark中合并具有条件的两列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在pyspark中合并具有条件的两列? [英] how to merge two columns with a condition in pyspark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在pyspark中合并具有条件的两列? [英] how to merge two columns with a condition in pyspark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭