使用负索引从pyspark字符串列的最后一个索引中子字符串化多个字符 [英] substring multiple characters from the last index of a pyspark string column using negative indexing

查看：311 发布时间：2020/9/4 4:04:27 python apache-spark pyspark

本文介绍了使用负索引从pyspark字符串列的最后一个索引中子字符串化多个字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

与以下内容密切相关:

Closely related to: Spark Dataframe column with last character of other column but I want to extract multiple characters from the -1 index.

我有以下pyspark数据框df

I have the following pyspark dataframe df

+----------+----------+
|    number|event_type|
+----------+----------+
|0342224022|        11|
|0112964715|        11|
+----------+----------+

我想从number列的最后一个索引中提取3个字符.

I want to extract 3 characters from the last index of the number column.

我尝试了以下操作:

from pyspark.sql.functions import substring 
df.select(substring(df['number'], -1, 3), 'event_type').show(2)

# which returns:

+----------------------+----------+
|substring(number,-1,3)|event_type|
+----------------------+----------+
|                     2|        11|
|                     5|        11|
+----------------------+----------+

以下是预期的输出(我不确定上面的输出是什么):

The below is the expected output (and I'm not sure what the output above is):

+----------------------+----------+
|substring(number,-1,3)|event_type|
+----------------------+----------+
|                   022|        11|
|                   715|        11|
+----------------------+----------+

我在做什么错了?

注意:Spark版本1.6.0

Note: Spark version 1.6.0

推荐答案

这是您使用

This is how you use substring. Your position will be -3 and the length is 3.

pyspark.sql.functions.substring(str, pos, len)

您需要将子字符串函数调用更改为:

You need to change your substring function call to:

from pyspark.sql.functions import substring
df.select(substring(df['number'], -3, 3), 'event_type').show(2)
#+------------------------+----------+
#|substring(number, -3, 3)|event_type|
#+------------------------+----------+
#|                     022|        11|
#|                     715|        11|
#+------------------------+----------+

这篇关于使用负索引从pyspark字符串列的最后一个索引中子字符串化多个字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用负索引从pyspark字符串列的最后一个索引中子字符串化多个字符 [英] substring multiple characters from the last index of a pyspark string column using negative indexing

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用负索引从pyspark字符串列的最后一个索引中子字符串化多个字符 [英] substring multiple characters from the last index of a pyspark string column using negative indexing

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭