如何从python中的字符串中提取一定长度的数字? [英] How to extract certain length of numbers from a string in python?
本文介绍了如何从python中的字符串中提取一定长度的数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个看起来像这样的数据框:
I have a dataframe which looks like this:
description
1906 RES 330 ML
1906 RES 330ML
RES 335 c/6
RES 332 c/12
我想提取数字的三个连续数字并将其保存在新的"volume"列中. 我的代码是这样的:
I want to extract the three consecutive digits of numbers and save it in a new column 'volume'. My code is like this:
df['volume'] = df['description'].str.extract('([([\d]*[\d]){3,3}?])')
应像这样预期结果:
volume
330
330
335
332
但是,它给出的结果如下:
However, it gives the results like this:
volume
1906
1906
335
332
有人可以帮助我修复此代码吗?非常感谢!!!
Can anyone help me fix this code? Thanks so much!!!
推荐答案
可能有些大材小用,但是如果您要确保不捕获属于4位数字的数字,则可以使用以下方法:
Might be overkill, but if you want to make sure you don't capture numbers that are part of 4 digit numbers, you might use this:
df['volume'] = df.description.str.extract(r'(?<!\d)(\d{3})(?!\d)', expand=False)
print(df)
description volume
0 1906 RES 330 ML 330
1 1906 RES 330ML 330
2 RES 335 c/6 335
3 RES 332 c/12 332
指定expand=False
,以便将匹配项仅作为一个pd.Series
返回.
Specify expand=False
, so that matches are returned as one pd.Series
only.
正则表达式:
-
(?<!\d)
-指定一组3位数字之前的任何内容都不是数字 -
(\d{3})
-匹配3位数字 -
(?!\d)
-指定一组3位数字之后的所有内容都不是数字
(?<!\d)
- specifies that anything before a set of 3 digits is something that is not a digit(\d{3})
- matches 3 digits(?!\d)
- specifies that anything after a set of 3 digits is something that is not a digit
这篇关于如何从python中的字符串中提取一定长度的数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文