在pandas DataFrame元素内拆分字符串,然后重新组合列表的一部分 [英] Split a string within a pandas DataFrame element and recombine a section of the list
问题描述
我试图弄清楚如何在pandas元素中分割字符串,然后重新组合分割字符串的一部分.我有以下代码:
I am trying to figure out how to split a string within a pandas element, then recombine a section of the split string. I have the following code:
import pandas as pd
df = pd.DataFrame({'code': ['PC001-S002_D_CFI4-1_NN','PC001-S002_D_CFI4-1_NN','PC001-S002_D_CFI4-1_NN',
'PC001-S002_D_CFI4-1_ER','PC001-S002_D_CFI4-1_ER','PC001-S002_D_CFI4-1_ER']})
df['domain'] = df['code'].str.split("_")
此代码用于在下划线上分割字符串.现在,我想将生成的拆分列表包含在该列中,并重新组合前三个元素,以便:
This code works for splitting the string on the underscore. Now I would like to take the resulting split list within the column and recombine the first three elements such that:
PC001-S001_D_CFI4-1_NN ==> PC001-S001_D_CFI4-1
如果我只是使用以下方法申请一个字符串,我可以这样做:
I can do this if I was just applying to a string using:
a = 'PC001-S002_D_CFI4-1_NN'
b = a.split("_")[0:3]
c = "_".join(b)
但是,我尝试将其应用于熊猫并没有取得太大的成功.
However, I have tried to apply this to pandas without much success.
任何建议都会受到欢迎.
Any advice would be greatly received.
推荐答案
,您可以使用或仅删除最后一部分:
In [7]: df['domain'] = df['code'].str.replace(r'\_\w+?$','')
In [8]: df
Out[8]:
code domain
0 PC001-S002_D_CFI4-1_NN PC001-S002_D_CFI4-1
1 PC001-S002_D_CFI4-1_NN PC001-S002_D_CFI4-1
2 PC001-S002_D_CFI4-1_NN PC001-S002_D_CFI4-1
3 PC001-S002_D_CFI4-1_ER PC001-S002_D_CFI4-1
4 PC001-S002_D_CFI4-1_ER PC001-S002_D_CFI4-1
5 PC001-S002_D_CFI4-1_ER PC001-S002_D_CFI4-1
这篇关于在pandas DataFrame元素内拆分字符串,然后重新组合列表的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!