通过获取下一个单词来提取公司在Python中的注册号 [英] Extract companies' register number in Python by getting the next word

查看:91
本文介绍了通过获取下一个单词来提取公司在Python中的注册号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取德语Handelsregisternummer(公司的注册号),该数字通常直接写在 HRB 后面.但是,我想在正则表达式中遇到一些例外.目标是调用函数并设置关键字(在本例中为 HRB ).然后函数返回数字.请参见 regex演示

I am trying to get the German Handelsregisternummer (companies' register number) which usually is directly written behind the word HRB. However there are exceptions which I would like to catch with my regex. The goal is to call the function and set the keyword (in this case it is HRB). Then the function returns the number. Please see regex demo!

这是我到目前为止所拥有的!这并不能解决所有情况.

This is what I have so far! This doesn't catch all cases.

def get_company_register_number(string, keyword):

  reg_1 = fr'\b{keyword}\b[,:|\s]*(\w+)' 

  match = re.compile(reg_1)
  company_register_number = match.findall(string) # list of matched words

  if company_register_number: # not empty 
    return company_register_number
  else: # no match found
    company_register_number = []

  return company_register_number


string = "HRB: 21156"
get_company_register_number(string, 'HRB')
>>>>>> ['21156']

推荐答案

您可以扩展字符类,并将单词边界移至匹配数字之前.

You could extend the character class and move the word boundary to before matching digits.

\bHRB[.,: \w-]*\b(\d+)

查看更新的 regex

或更精确的匹配:

\bHRB[,:]?(?:[- ](?:Nr|Nummer)[.:]*)? (\d+)

  • \ bHRB 字边界,然后匹配HRB
  • [,:]?(可选)匹配:
  • (?:非捕获组
    • [-](?: Nr | Nummer)[.:] * 匹配空格或-,然后匹配Nr或Nummer并乘以0 + a.或:
      • \bHRB Word boundary, then match HRB
      • [,:]? Optionally match , or :
      • (?: Non capture group
        • [- ](?:Nr|Nummer)[.:]* Match space or -, then Nr or Nummer and 0+ times a . or :
        • 正则表达式演示

          这篇关于通过获取下一个单词来提取公司在Python中的注册号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆