Pandas DataFrame:在我要保留的字符串前后删除字符串中不需要的部分 [英] Pandas DataFrame: remove unwanted parts from strings before and after what I want to keep
问题描述
在我的data_cleaner数据集中,我具有(功能)项目ID"列.这标识了项目,并且格式为代码/年/代码".我只对项目的年份感兴趣,所以我想摆脱第一个/之前的所有内容,以及第二个/之后的所有内容.
In my data_cleaner dataset I have the column (feature) 'Project ID'. This identifies the project and it has a format 'code/YEAR/code'. I'm only interested in the project's year so I want to get rid of everything before the first / and everything after the second /.
Project ID
AGPG/2013/1
AGPG/2013/10
AGPG/2013/12
AGPG/2013/18
AGPG/2013/19
我最接近的是用
data_cleaner['Project ID'] = data_cleaner['Project ID'].str.strip("AGPG")
(但由于还有其他字母,因此这是不可升级的)
(but down the line there are other letters so this is not escalable)
然后我做了
data_cleaner['Project ID'] = data_cleaner['Project ID'].str.strip('/')
这摆脱了第一点,我无法摆脱那一年之后的情况.
This got rid of the first bit, I can't manage to get rid of what's after the year.
Project ID
2013/1
2013/10
2013/12
2013/18
2013/19
我阅读了这篇文章,但没有帮助我 Pandas DataFrame:从一列中的字符串中删除不需要的部分
I read this post but didn't help me Pandas DataFrame: remove unwanted parts from strings in a column
推荐答案
我认为需要或 extract
由正则表达式表示-/(\d{4})/
表示获取长度为4
在//
之间的数字:
Or extract
by regex - /(\d{4})/
means get numeric with length 4
between //
:
data_cleaner['Project ID'] = data_cleaner['Project ID'].str.extract('/(\d{4})/', expand=False)
print (data_cleaner)
Project ID
0 2013
1 2013
2 2013
3 2013
4 2013
这篇关于Pandas DataFrame:在我要保留的字符串前后删除字符串中不需要的部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!