如何从python中的字符串中提取一定长度的数字? [英] How to extract certain length of numbers from a string in python?

查看:903
本文介绍了如何从python中的字符串中提取一定长度的数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框:

I have a dataframe which looks like this:

description     
1906 RES 330 ML
1906 RES 330ML
RES 335 c/6
RES 332 c/12

我想提取数字的三个连续数字并将其保存在新的"volume"列中. 我的代码是这样的:

I want to extract the three consecutive digits of numbers and save it in a new column 'volume'. My code is like this:

df['volume'] = df['description'].str.extract('([([\d]*[\d]){3,3}?])')

应像这样预期结果:

volume
330
330
335
332

但是,它给出的结果如下:

However, it gives the results like this:

volume
1906
1906
335
332

有人可以帮助我修复此代码吗?非常感谢!!!

Can anyone help me fix this code? Thanks so much!!!

推荐答案

可能有些大材小用,但是如果您要确保不捕获属于4位数字的数字,则可以使用以下方法:

Might be overkill, but if you want to make sure you don't capture numbers that are part of 4 digit numbers, you might use this:

df['volume'] = df.description.str.extract(r'(?<!\d)(\d{3})(?!\d)', expand=False)    
print(df)

       description volume
0  1906 RES 330 ML    330
1   1906 RES 330ML    330
2      RES 335 c/6    335
3     RES 332 c/12    332

指定expand=False,以便将匹配项仅作为一个pd.Series返回.

Specify expand=False, so that matches are returned as one pd.Series only.

正则表达式:

  • (?<!\d)-指定一组3位数字之前的任何内容都不是数字
  • (\d{3})-匹配3位数字
  • (?!\d)-指定一组3位数字之后的所有内容都不是数字
  • (?<!\d) - specifies that anything before a set of 3 digits is something that is not a digit
  • (\d{3}) - matches 3 digits
  • (?!\d) - specifies that anything after a set of 3 digits is something that is not a digit

这篇关于如何从python中的字符串中提取一定长度的数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆