Pandas DataFrame:在我要保留的字符串前后删除字符串中不需要的部分 [英] Pandas DataFrame: remove unwanted parts from strings before and after what I want to keep

查看:817
本文介绍了Pandas DataFrame:在我要保留的字符串前后删除字符串中不需要的部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的data_cleaner数据集中,我具有(功能)项目ID"列.这标识了项目,并且格式为代码/年/代码".我只对项目的年份感兴趣,所以我想摆脱第一个/之前的所有内容,以及第二个/之后的所有内容.

In my data_cleaner dataset I have the column (feature) 'Project ID'. This identifies the project and it has a format 'code/YEAR/code'. I'm only interested in the project's year so I want to get rid of everything before the first / and everything after the second /.

Project ID  
AGPG/2013/1 
AGPG/2013/10
AGPG/2013/12
AGPG/2013/18
AGPG/2013/19

我最接近的是用

data_cleaner['Project ID'] = data_cleaner['Project ID'].str.strip("AGPG")

(但由于还有其他字母,因此这是不可升级的)

(but down the line there are other letters so this is not escalable)

然后我做了

data_cleaner['Project ID'] = data_cleaner['Project ID'].str.strip('/')

这摆脱了第一点,我无法摆脱那一年之后的情况.

This got rid of the first bit, I can't manage to get rid of what's after the year.

Project ID  
2013/1  
2013/10
2013/12
2013/18
2013/19

我阅读了这篇文章,但没有帮助我 Pandas DataFrame:从一列中的字符串中删除不需要的部分

I read this post but didn't help me Pandas DataFrame: remove unwanted parts from strings in a column

推荐答案

我认为需要 extract 由正则表达式表示-/(\d{4})/表示获取长度为4//之间的数字:

Or extract by regex - /(\d{4})/ means get numeric with length 4 between //:

data_cleaner['Project ID'] = data_cleaner['Project ID'].str.extract('/(\d{4})/', expand=False)

print (data_cleaner)
  Project ID
0       2013
1       2013
2       2013
3       2013
4       2013

这篇关于Pandas DataFrame:在我要保留的字符串前后删除字符串中不需要的部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆