如何在串联pyspark中的文本列之前对值进行排序 [英] how to sort value before concatenate text columns in pyspark

查看：161 发布时间：2020/5/24 1:23:33 pandas pyspark pyspark-sql pyspark-dataframes

本文介绍了如何在串联pyspark中的文本列之前对值进行排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要帮助将以下代码转换为Pyspark代码或Pyspark sql代码.

I need help to convert below code in Pyspark code or Pyspark sql code.

df["full_name"] = df.apply(lambda x: "_".join(sorted((x["first"], x["last"]))), axis=1)

基本上，它添加了一个新的列名 full_name ，该列名必须以排序的方式将列的值首尾相连.

Its basically adding one new column name full_name which have to concatenate values of the columns first and last in a sorted way.

我已经完成了下面的代码，但不知道如何应用于对列文本值进行排序.

I have done below code but don't know how to apply to sort in a columns text value.

df= df.withColumn('full_name', f.concat(f.col('first'),f.lit('_'), f.col('last')))

推荐答案

来自 Spark-2.4 + :

在这种情况下，我们可以使用 array_join, array_sort 函数.

We can use array_join, array_sort functions for this case.

示例:

df.show()
#+-----+----+
#|first|last|
#+-----+----+
#|    a|   b|
#|    e|   c|
#|    d|   a|
#+-----+----+

from pyspark.sql.functions import *
#first we create array of first,last columns then apply sort and join on array
df.withColumn("full_name",array_join(array_sort(array(col("first"),col("last"))),"_")).show()
#+-----+----+---------+
#|first|last|full_name|
#+-----+----+---------+
#|    a|   b|      a_b|
#|    e|   c|      c_e|
#|    d|   a|      a_d|
#+-----+----+---------+

这篇关于如何在串联pyspark中的文本列之前对值进行排序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在串联pyspark中的文本列之前对值进行排序 [英] how to sort value before concatenate text columns in pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在串联pyspark中的文本列之前对值进行排序 [英] how to sort value before concatenate text columns in pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭