Python Spark从数据框提取字符 [英] Python spark extract characters from dataframe
本文介绍了Python Spark从数据框提取字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在Spark中有一个数据框,像这样:
I have a dataframe in spark, something like this:
ID | Column
------ | ----
1 | STRINGOFLETTERS
2 | SOMEOTHERCHARACTERS
3 | ANOTHERSTRING
4 | EXAMPLEEXAMPLE
我想做的是从列中提取前5个字符和第8个字符,然后创建一个新列,如下所示:
What I would like to do is extract the first 5 characters from the column plus the 8th character and create a new column, something like this:
ID | New Column
------ | ------
1 | STRIN_F
2 | SOMEO_E
3 | ANOTH_S
4 | EXAMP_E
我不能使用以下编解码器,因为列中的值不同,并且我不想拆分为特定字符,而是拆分为第6个字符:
I can't use the following codem, because the values in the columns differ, and I don't want to split on a specific character, but on the 6th character:
import pyspark
split_col = pyspark.sql.functions.split(DF['column'], ' ')
newDF = DF.withColumn('new_column', split_col.getItem(0))
谢谢!
推荐答案
使用类似这样的内容:
df.withColumn('new_column', concat(df.Column.substr(1, 5),
lit('_'),
df.Column.substr(8, 1)))
此功能使用 concat
这些功能可以解决您的问题.
Those functions will solve your problem.
这篇关于Python Spark从数据框提取字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文