在 Python 中匹配字符串的特定模式之后获取数字 [英] Get number present after a particular pattern of a matching string in Python

查看：29 发布时间：2021/8/31 18:43:52 python regex string-matching

本文介绍了在 Python 中匹配字符串的特定模式之后获取数字的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想获取所有匹配的数字(仅数字示例 '0012--22')或包含与之对应的一些文本(示例 'RF332')的数字，这些文本与提供的字符串列表匹配(my_list" in编码).带有数字的文本将出现的格式就像用一两个空格分隔.提供示例输入文件以供参考.

I want to get all the matching numbers(only numbers example '0012--22') or numbers which contain some text (example 'RF332') corresponding to it which matches with a list of strings provided("my_list" in the code). The format in which the text with number will be present is like separated by a space or two. Providing sample input file for reference.

这是输入文件:

$cat input_file
some text before Expedien: 1-21-212-16-26 some random text
Reference RE9833 of all sentences.
abc
123
456
something blah blah Ref.: 
tramite  1234567
Ref.:
some junk Expedien N° 18-00777 # some new content
some text Expedien N°18-0022995 # some garbled content

现在的脚本附在下面:它目前只识别一个元素，它是 {'tramite': '1234567'}

The script till now is attached below: It is currently only identifying one element which is {'tramite': '1234567'}

import re
import glob
import os

my_list = ['Ref.:', 'Reference', 'tramite', 'Expediente', 'Expediente No', 'Expedien N°', 'Exp.No', 'Expedien']

#open the file as input
with open('garb.txt','r') as infile:
  res = dict()
  for line in infile:  
    elems = re.split('(?::)?\s+', line)
    #print(elems)
    if len(elems) >= 2 :
      contains = False
      tmp = ''
      for elem in elems:  
        if contains:
          res.update({tmp : elem})
          print(res)
          contains = False
          break
        if elem in my_list:
          contains = True
          tmp = elem
  #print(res)

这是预期的输出:

示例输出:

{'Expedien N°': '18-0022995'}
{'Expedien N°': '18-0022995'}
{'Expedien': '1-21-212-16-26'}
{'Reference' : 'RE9833'}

等等等等

推荐答案

您可以使用

(?<!\w)(your|escaped|keywords|here)\W*([A-Z]*\d+(?:-+[A-Z]*\d+)*)

请参阅正则表达式演示.

模式详情

(?<!\w) - 左词边界(明确，\b 含义取决于上下文，如果下一个字符是非词字符，它将需要左侧的字符字符，这不是用户通常所期望的)
(your|escaped|keywords|here) - 捕获第 1 组:您的关键字列表，可以使用 '|'.join(map(re.escape,my_list))(注意 re.escape 是转义特殊正则表达式元字符(如 .、+、(、[ 等)
\W* - 0+ 个非单词字符(字母、数字或 _ 以外的字符)
([A-Z]*\d+(?:-+[A-Z]*\d+)*) - 捕获第 2 组:
- [A-Z]* - 零个或多个大写 ASCII 字母
- \d+ - 1 个或多个数字
- (?:-+[A-Z]*\d+)* - 0 次或多次重复
  - -+ - 一个或多个连字符
  - [A-Z]*\d+ - 零个或多个大写 ASCII 字母，1 个或多个数字
  - (?<!\w) - left word boundary (unambiguous, \b meaning is context dependent and if the next char is a non-word char, it will require a word char on the left, and that is not something users usually expect)
  - (your|escaped|keywords|here) - Capturing group 1: your list of keywords, it can be easily built using '|'.join(map(re.escape,my_list)) (note re.escape is necessary to escape special regex metacharacters like ., +, (, [, etc.)
  - \W* - 0+ non-word chars (chars other than letters, digits or _)
  - ([A-Z]*\d+(?:-+[A-Z]*\d+)*) - Capturing group 2:
    - [A-Z]* - zero or more uppercase ASCII letters
    - \d+ - 1 or more digits
    - (?:-+[A-Z]*\d+)* - 0 or more repetitions of
      - -+ - one or more hyphens
      - [A-Z]*\d+ - zero or more uppercase ASCII letters, 1 or more digits
      查看 Python 演示:
```
import re
s="""your_text_here"""
my_list = ['Ref.:', 'Reference', 'tramite', 'Expediente', 'Expediente No', 'Expedien N°', 'Exp.No', 'Expedien']
rx = r'(?<!\w)({})\W*([A-Z]*\d+(?:-+[A-Z]*\d+)*)'.format('|'.join(map(re.escape,my_list)))
print(re.findall(rx, s))
```
      输出:
```
[('Expedien', '1-21-212-16-26'), ('Reference', 'RE9833'), ('tramite', '1234567'), ('Expedien N°', '18-00777'), ('Expedien N°', '18-0022995')]
```
      这篇关于在 Python 中匹配字符串的特定模式之后获取数字的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 Python 中匹配字符串的特定模式之后获取数字 [英] Get number present after a particular pattern of a matching string in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在 Python 中匹配字符串的特定模式之后获取数字 [英] Get number present after a particular pattern of a matching string in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭