Python:如何在匹配之间获取字符串? [英] Python: How to get string between matches?

查看:39
本文介绍了Python:如何在匹配之间获取字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有

FILE = open("file.txt", "r") #长文本文件文本 = FILE.read()#long 带点 (.) 和斜线 (-) 的识别码regex = "进程\d\d\d\d\d\d\d\-\d\d\.\d\d\d\d\.\d+\.\d\d\.\d\d\d\d"SRC = re.findall(regex, TEXT, flags=re.IGNORECASE|re.MULTILINE)

如何获取第一次出现的第一个字符 SRC[i] 和下一次出现的第一个字符 SRC[i+1] 之间的文本等等?找不到任何直接令人满意的答案...

更多信息

pattern = 'process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}'sample_input = "Process 1234567-89.1234.12431242.12.1234 - 文本标题和长文本描述,没有确定的模式 Process 2234567-89.1234.12431242.12.1234 : chars and more text Process 3723242342.text 3723242434. more-24342435. text.12431242.12.1234 (...)"sample_output[0] = "进程 1234567-89.1234.12431242.12.1234 - 文本标题和长文本描述,没有确定的模式"sample_output[1] = "进程 2234567-89.1234.12431242.12.1234:字符和更多文本"sample_output[2] = "进程 3234567-89.1234.12431242.12.1234 - 更多文本"sample_output[3] = "进程 3234567-89.1234.12431242.12.1234 "

解决方案

你可以使用这个正则表达式:

(进程\d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*?)(?=进程)|(进程\d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*)

工作演示

)

比赛信息

匹配 11. [0-105] `进程 1234567-89.1234.12431242.12.1234 - 没有确定模式的文本标题和长文本描述`第 2 场1. [105-168] `进程 2234567-89.1234.12431242.12.1234 : 字符和更多文本 `第 3 场1. [168-221] `进程 3234567-89.1234.12431242.12.1234 - 更多文字`第 4 场2. [221-267]`进程3234567-89.1234.12431242.12.1234(...)`

您可以使用此代码:

<预> <代码> sample_input =过程1234567-89.1234.12431242.12.1234 - 文本标题和长文本描述,没有保证图案过程2234567-89.1234.12431242.12.1234:字符和更多的文本处理3234567-89.1234.12431242.12.1234 -更多文本处理 3234567-89.1234.12431242.12.1234 (...)"m = re.match(r"(Process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*?)(?=进程)|(进程\d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*)", 样本输入)m.group(1) # 第一个带括号的子组.m.groups() # 返回一个包含匹配所有子组的元组,从 1 到模式中有多少组

I have

FILE = open("file.txt", "r") #long text file
TEXT = FILE.read()

#long identification code with dots (.) and slashes (-)
regex = "process \d\d\d\d\d\d\d\-\d\d\.\d\d\d\d\.\d+\.\d\d\.\d\d\d\d"
SRC = re.findall(regex, TEXT, flags=re.IGNORECASE|re.MULTILINE)

How can I get the text between first char of first occurence SRC[i] and first char of next ocurrence SRC[i+1] and so on? Couldn't find any straight forward satisfatory answer...

MORE INFO EDIT:

pattern = 'process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}'

sample_input = "Process 1234567-89.1234.12431242.12.1234 -  text title and long text description with no assured pattern Process 2234567-89.1234.12431242.12.1234 : chars and more text Process 3234567-89.1234.12431242.12.1234 - more text process 3234567-89.1234.12431242.12.1234 (...)"

sample_output[0] = "Process 1234567-89.1234.12431242.12.1234 -  text title and long text description with no assured pattern "
sample_output[1] = "Process 2234567-89.1234.12431242.12.1234 : chars and more text "
sample_output[2] = "Process 3234567-89.1234.12431242.12.1234 - more text "
sample_output[3] = "process 3234567-89.1234.12431242.12.1234    "

解决方案

You can use this regex:

(Process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*?)(?=Process)|(Process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*)

Working demo

)

Match information

MATCH 1
1.  [0-105] `Process 1234567-89.1234.12431242.12.1234 -  text title and long text description with no assured pattern `
MATCH 2
1.  [105-168]   `Process 2234567-89.1234.12431242.12.1234 : chars and more text `
MATCH 3
1.  [168-221]   `Process 3234567-89.1234.12431242.12.1234 - more text `
MATCH 4
2.  [221-267]   `Process 3234567-89.1234.12431242.12.1234 (...)`

You can use this code:

sample_input = "Process 1234567-89.1234.12431242.12.1234 -  text title and long text description with no assured pattern Process 2234567-89.1234.12431242.12.1234 : chars and more text Process 3234567-89.1234.12431242.12.1234 - more text process 3234567-89.1234.12431242.12.1234 (...)"
m = re.match(r"(Process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*?)(?=Process)|(Process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*)", sample_input)
m.group(1)       # The first parenthesized subgroup.
m.groups()       # Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern

这篇关于Python:如何在匹配之间获取字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆