从字符串列表中提取薪水 [英] Extract salaries from a list of strings
问题描述
我正在尝试从字符串列表中提取薪水.我正在使用regex findall()函数,但它返回许多空字符串以及薪水,这在以后的代码中给我造成了问题.
sal = '41 000€à63 000€/an'#这是我有错误的示例字符串regex ='?([0-9] *?[0-9]?[0-9]?[0-9]?)'#这是我的正则表达式re.findall(regex,sal)[0]#如预期返回'41 000',但:re.findall(regex,sal)[1]#returns:''#预期结果:'63 000'#整个比赛清单是这样的:['41 000','','','','','','','63 000','','','','','','','','',"]#我希望使用['41 000','63 000']
任何人都可以帮忙吗?谢谢
使用 re.findall 会为您提供捕获组,并且您正在使用的组中几乎所有内容都是可选的,从而在结果中提供空字符串.
在您的模式中,您使用 [0-9] *
,该数字将匹配数字0+倍.如果前导位数没有限制,则可以使用 [0-9] +
使其不可选.
您可以将此模式与捕获组一起使用:
(?<!\ S)([0-9] +(?: [0-9] {1,3})?)€(?!\ S)
说明
-
(?<!\ S)
断言左侧的字符不是非空格字符 -
(
捕获组-
[0-9] +(?: [0-9] {1,3})?
匹配1+位数字,后跟匹配空格和1-3位数字的可选部分
-
-
)
关闭捕获组 -
€
字面匹配 -
(?!\ S)
断言右侧的字符不是非空格字符
您的代码可能如下:
import resal = '41 000€à63 000€/an'#这是我有错误的示例字符串regex ='(?<!\ S)([0-9] +(?: [0-9] {1,3})?)€(?!\ S)'print(re.findall(regex,sal))#['41 000','63 000']
I'm trying to extract salaries from a list of strings. I'm using the regex findall() function but it's returning many empty strings as well as the salaries and this is causing me problems later in my code.
sal= '41 000€ à 63 000€ / an' #this is a sample string for which i have errors
regex = ' ?([0-9]* ?[0-9]?[0-9]?[0-9]?)'#this is my regex
re.findall(regex,sal)[0]
#returns '41 000' as expected but:
re.findall(regex,sal)[1]
#returns: ''
#Desired result : '63 000'
#the whole list of matches is like this:
['41 000',
'',
'',
'',
'',
'',
'',
'63 000',
'',
'',
'',
'',
'',
'',
'',
'',
'']
# I would prefer ['41 000','63 000']
Can anyone help? Thanks
Using re.findall will give you the capturing groups when you use them in your pattern and you are using a group where almost everything is optional giving you the empty strings in the result.
In your pattern you use [0-9]*
which would match 0+ times a digit. If there is not limit to the leading digits, you might use [0-9]+
instead to not make it optional.
You might use this pattern with a capturing group:
(?<!\S)([0-9]+(?: [0-9]{1,3})?)€(?!\S)
Explanation
(?<!\S)
Assert what is on the left is not a non whitespace character(
Capture group[0-9]+(?: [0-9]{1,3})?
match 1+ digits followed by an optional part that matches a space and 1-3 digits
)
Close capture group€
Match literally(?!\S)
Assert what is on the right is not a non whitespace character
Your code might look like:
import re
sal= '41 000€ à 63 000€ / an' #this is a sample string for which i have errors
regex = '(?<!\S)([0-9]+(?: [0-9]{1,3})?)€(?!\S)'
print(re.findall(regex,sal)) # ['41 000', '63 000']
这篇关于从字符串列表中提取薪水的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!