使用负索引从pyspark字符串列的最后一个索引中子字符串化多个字符 [英] substring multiple characters from the last index of a pyspark string column using negative indexing
本文介绍了使用负索引从pyspark字符串列的最后一个索引中子字符串化多个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
Closely related to: Spark Dataframe column with last character of other column
but I want to extract multiple characters from the -1
index.
我有以下pyspark数据框df
I have the following pyspark dataframe df
+----------+----------+
| number|event_type|
+----------+----------+
|0342224022| 11|
|0112964715| 11|
+----------+----------+
我想从number
列的最后一个索引中提取3个字符.
I want to extract 3 characters from the last index of the number
column.
我尝试了以下操作:
from pyspark.sql.functions import substring
df.select(substring(df['number'], -1, 3), 'event_type').show(2)
# which returns:
+----------------------+----------+
|substring(number,-1,3)|event_type|
+----------------------+----------+
| 2| 11|
| 5| 11|
+----------------------+----------+
以下是预期的输出(我不确定上面的输出是什么):
The below is the expected output (and I'm not sure what the output above is):
+----------------------+----------+
|substring(number,-1,3)|event_type|
+----------------------+----------+
| 022| 11|
| 715| 11|
+----------------------+----------+
我在做什么错了?
注意:Spark版本1.6.0
Note: Spark version 1.6.0
推荐答案
This is how you use substring
. Your position will be -3 and the length is 3.
pyspark.sql.functions.substring(str, pos, len)
您需要将子字符串函数调用更改为:
You need to change your substring function call to:
from pyspark.sql.functions import substring
df.select(substring(df['number'], -3, 3), 'event_type').show(2)
#+------------------------+----------+
#|substring(number, -3, 3)|event_type|
#+------------------------+----------+
#| 022| 11|
#| 715| 11|
#+------------------------+----------+
这篇关于使用负索引从pyspark字符串列的最后一个索引中子字符串化多个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文