如何在 Spark SQL (Dataframes) 中提取数组的切片? [英] How to pull the slice of an array in Spark SQL (Dataframes)?

查看：190 发布时间：2021/6/24 20:38:40 python apache-spark pyspark apache-spark-sql

本文介绍了如何在 Spark SQL (Dataframes) 中提取数组的切片?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一列包含拆分的 http 请求的数组.我将它们过滤为以下两种可能性之一:

I have a column full of arrays containing split http requests. I have them filtered down to one of two possibilities:

|[, courses, 27381...|
|[, courses, 27547...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, api, v1, cours...|
|[, courses, 33287...|
|[, courses, 24024...|

在两种数组类型中，从课程"开始是相同的数据和结构.

In both array-types, from 'courses' onward is the same data and structure.

我想使用 case 语句获取数组的切片，其中如果数组的第一个元素是 'api'，则获取元素 3 -> 数组的结尾.我试过使用 Python 切片语法 [3:] 和普通 PostgreSQL 语法 [3, n] where n 是数组的长度.如果不是'api'，则取给定的值.

I want to take the slice of the array using a case statement where if the first element of the array is 'api', then take elements 3 -> end of the array. I've tried using Python slice syntax [3:], and normal PostgreSQL syntax [3, n] where n is the length of the array. If it's not 'api', then just take the given value.

我理想的最终结果是一个数组，其中每一行都共享相同的结构，课程位于第一个索引中，以便从那时起更容易解析.

My ideal end-result would be an array where every row shares the same structure, with courses in the first index for easier parsing from that point onwards.

推荐答案

定义一个 UDF 很简单，你做了一个之前非常相似的问题所以我不会发布确切的答案让你思考和学习(对于你自己的好).

It's very easy just define a UDF, you made a very similar question previously so I won't post the exact answer to let you think and learn (for your own good).

from pyspark.sql.functions import udf

df = sc.parallelize([(["ab", "bs", "xd"],), (["bc", "cd", ":x"],)]).toDF()

getUDF = udf(lambda x, y: x[1:] if x[y] == "ab" else x)

df.select(getUDF(col("_1"), lit(0))).show()

+------------------------+
|PythonUDF#<lambda>(_1,0)|
+------------------------+
|                [bs, xd]|
|            [bc, cd, :x]|
+------------------------+

这篇关于如何在 Spark SQL (Dataframes) 中提取数组的切片?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 Spark SQL (Dataframes) 中提取数组的切片? [英] How to pull the slice of an array in Spark SQL (Dataframes)?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在 Spark SQL (Dataframes) 中提取数组的切片? [英] How to pull the slice of an array in Spark SQL (Dataframes)?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭