从字符串列表中提取薪水 [英] Extract salaries from a list of strings

查看:61
本文介绍了从字符串列表中提取薪水的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从字符串列表中提取薪水.我正在使用regex findall()函数,但它返回许多空字符串以及薪水,这在以后的代码中给我造成了问题.

 sal = '41 000€à63 000€/an'#这是我有错误的示例字符串regex ='?([0-9] *?[0-9]?[0-9]?[0-9]?)'#这是我的正则表达式re.findall(regex,sal)[0]#如预期返回'41 000',但:re.findall(regex,sal)[1]#returns:''#预期结果:'63 000'#整个比赛清单是这样的:['41 000','','','','','','','63 000','','','','','','','','',"]#我希望使用['41 000','63 000'] 

任何人都可以帮忙吗?谢谢

解决方案

使用 re.findall 会为您提供捕获组,并且您正在使用的组中几乎所有内容都是可选的,从而在结果中提供空字符串.

在您的模式中,您使用 [0-9] * ,该数字将匹配数字0+倍.如果前导位数没有限制,则可以使用 [0-9] + 使其不可选.

您可以将此模式与捕获组一起使用:

 (?<!\ S)([0-9] +(?: [0-9] {1,3})?)€(?!\ S) 

正则表达式演示 | Python演示

说明

  • (?<!\ S)断言左侧的字符不是非空格字符
  • (捕获组
    • [0-9] +(?: [0-9] {1,3})?匹配1+位数字,后跟匹配空格和1-3位数字的可选部分
  • )关闭捕获组
  • 字面匹配
  • (?!\ S)断言右侧的字符不是非空格字符

您的代码可能如下:

  import resal = '41 000€à63 000€/an'#这是我有错误的示例字符串regex ='(?<!\ S)([0-9] +(?: [0-9] {1,3})?)€(?!\ S)'print(re.findall(regex,sal))#['41 000','63 000'] 

I'm trying to extract salaries from a list of strings. I'm using the regex findall() function but it's returning many empty strings as well as the salaries and this is causing me problems later in my code.


sal= '41 000€ à 63 000€ / an' #this is a sample string for which i have errors

regex = ' ?([0-9]* ?[0-9]?[0-9]?[0-9]?)'#this is my regex

re.findall(regex,sal)[0]
#returns '41 000' as expected but:
re.findall(regex,sal)[1]
#returns: '' 
#Desired result : '63 000'

#the whole list of matches is like this:
['41 000',
 '',
 '',
 '',
 '',
 '',
 '',
 '63 000',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '']
# I would prefer ['41 000','63 000']

Can anyone help? Thanks

解决方案

Using re.findall will give you the capturing groups when you use them in your pattern and you are using a group where almost everything is optional giving you the empty strings in the result.

In your pattern you use [0-9]* which would match 0+ times a digit. If there is not limit to the leading digits, you might use [0-9]+ instead to not make it optional.

You might use this pattern with a capturing group:

(?<!\S)([0-9]+(?: [0-9]{1,3})?)€(?!\S)

Regex demo | Python demo

Explanation

  • (?<!\S) Assert what is on the left is not a non whitespace character
  • ( Capture group
    • [0-9]+(?: [0-9]{1,3})? match 1+ digits followed by an optional part that matches a space and 1-3 digits
  • ) Close capture group
  • Match literally
  • (?!\S) Assert what is on the right is not a non whitespace character

Your code might look like:

import re
sal= '41 000€ à 63 000€ / an' #this is a sample string for which i have errors
regex = '(?<!\S)([0-9]+(?: [0-9]{1,3})?)€(?!\S)'
print(re.findall(regex,sal))  # ['41 000', '63 000']

这篇关于从字符串列表中提取薪水的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆