通过获取下一个单词来提取公司在Python中的注册号 [英] Extract companies' register number in Python by getting the next word
问题描述
我正在尝试获取德语Handelsregisternummer(公司的注册号),该数字通常直接写在 HRB
后面.但是,我想在正则表达式中遇到一些例外.目标是调用函数并设置关键字(在本例中为 HRB
).然后函数返回数字.请参见 regex演示!
I am trying to get the German Handelsregisternummer (companies' register number) which usually is directly written behind the word HRB
. However there are exceptions which I would like to catch with my regex. The goal is to call the function and set the keyword (in this case it is HRB
). Then the function returns the number. Please see regex demo!
这是我到目前为止所拥有的!这并不能解决所有情况.
This is what I have so far! This doesn't catch all cases.
def get_company_register_number(string, keyword):
reg_1 = fr'\b{keyword}\b[,:|\s]*(\w+)'
match = re.compile(reg_1)
company_register_number = match.findall(string) # list of matched words
if company_register_number: # not empty
return company_register_number
else: # no match found
company_register_number = []
return company_register_number
string = "HRB: 21156"
get_company_register_number(string, 'HRB')
>>>>>> ['21156']
推荐答案
您可以扩展字符类,并将单词边界移至匹配数字之前.
You could extend the character class and move the word boundary to before matching digits.
\bHRB[.,: \w-]*\b(\d+)
查看更新的 regex
或更精确的匹配:
\bHRB[,:]?(?:[- ](?:Nr|Nummer)[.:]*)? (\d+)
-
\ bHRB
字边界,然后匹配HRB -
[,:]?
(可选)匹配,
或:
-
(?:
非捕获组-
[-](?: Nr | Nummer)[.:] *
匹配空格或-
,然后匹配Nr或Nummer并乘以0 + a.或: \bHRB
Word boundary, then match HRB[,:]?
Optionally match,
or:
(?:
Non capture group[- ](?:Nr|Nummer)[.:]*
Match space or-
, then Nr or Nummer and 0+ times a . or :
这篇关于通过获取下一个单词来提取公司在Python中的注册号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-