在pandas DataFrame元素内拆分字符串,然后重新组合列表的一部分 [英] Split a string within a pandas DataFrame element and recombine a section of the list

查看:115
本文介绍了在pandas DataFrame元素内拆分字符串,然后重新组合列表的一部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清楚如何在pandas元素中分割字符串,然后重新组合分割字符串的一部分.我有以下代码:

I am trying to figure out how to split a string within a pandas element, then recombine a section of the split string. I have the following code:

import pandas as pd

df = pd.DataFrame({'code': ['PC001-S002_D_CFI4-1_NN','PC001-S002_D_CFI4-1_NN','PC001-S002_D_CFI4-1_NN',
                            'PC001-S002_D_CFI4-1_ER','PC001-S002_D_CFI4-1_ER','PC001-S002_D_CFI4-1_ER']})

df['domain'] = df['code'].str.split("_")

此代码用于在下划线上分割字符串.现在,我想将生成的拆分列表包含在该列中,并重新组合前三个元素,以便:

This code works for splitting the string on the underscore. Now I would like to take the resulting split list within the column and recombine the first three elements such that:

PC001-S001_D_CFI4-1_NN ==> PC001-S001_D_CFI4-1

如果我只是使用以下方法申请一个字符串,我可以这样做:

I can do this if I was just applying to a string using:

a = 'PC001-S002_D_CFI4-1_NN'
b = a.split("_")[0:3]
c = "_".join(b)

但是,我尝试将其应用于熊猫并没有取得太大的成功.

However, I have tried to apply this to pandas without much success.

任何建议都会受到欢迎.

Any advice would be greatly received.

推荐答案

,您可以使用或仅删除最后一部分:

In [7]: df['domain'] = df['code'].str.replace(r'\_\w+?$','')

In [8]: df
Out[8]:
                     code               domain
0  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
1  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
2  PC001-S002_D_CFI4-1_NN  PC001-S002_D_CFI4-1
3  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1
4  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1
5  PC001-S002_D_CFI4-1_ER  PC001-S002_D_CFI4-1

这篇关于在pandas DataFrame元素内拆分字符串,然后重新组合列表的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆