Python:如何在匹配之间获取字符串? [英] Python: How to get string between matches?
本文介绍了Python:如何在匹配之间获取字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有
FILE = open("file.txt", "r") #长文本文件文本 = FILE.read()#long 带点 (.) 和斜线 (-) 的识别码regex = "进程\d\d\d\d\d\d\d\-\d\d\.\d\d\d\d\.\d+\.\d\d\.\d\d\d\d"SRC = re.findall(regex, TEXT, flags=re.IGNORECASE|re.MULTILINE)
如何获取第一次出现的第一个字符 SRC[i]
和下一次出现的第一个字符 SRC[i+1]
之间的文本等等?找不到任何直接令人满意的答案...
更多信息
pattern = 'process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}'sample_input = "Process 1234567-89.1234.12431242.12.1234 - 文本标题和长文本描述,没有确定的模式 Process 2234567-89.1234.12431242.12.1234 : chars and more text Process 3723242342.text 3723242434. more-24342435. text.12431242.12.1234 (...)"sample_output[0] = "进程 1234567-89.1234.12431242.12.1234 - 文本标题和长文本描述,没有确定的模式"sample_output[1] = "进程 2234567-89.1234.12431242.12.1234:字符和更多文本"sample_output[2] = "进程 3234567-89.1234.12431242.12.1234 - 更多文本"sample_output[3] = "进程 3234567-89.1234.12431242.12.1234 "
解决方案
你可以使用这个正则表达式:
(进程\d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*?)(?=进程)|(进程\d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*)
)
比赛信息
匹配 11. [0-105] `进程 1234567-89.1234.12431242.12.1234 - 没有确定模式的文本标题和长文本描述`第 2 场1. [105-168] `进程 2234567-89.1234.12431242.12.1234 : 字符和更多文本 `第 3 场1. [168-221] `进程 3234567-89.1234.12431242.12.1234 - 更多文字`第 4 场2. [221-267]`进程3234567-89.1234.12431242.12.1234(...)`
您可以使用此代码:
<预> <代码> sample_input =过程1234567-89.1234.12431242.12.1234 - 文本标题和长文本描述,没有保证图案过程2234567-89.1234.12431242.12.1234:字符和更多的文本处理3234567-89.1234.12431242.12.1234 -更多文本处理 3234567-89.1234.12431242.12.1234 (...)"m = re.match(r"(Process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*?)(?=进程)|(进程\d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*)", 样本输入)m.group(1) # 第一个带括号的子组.m.groups() # 返回一个包含匹配所有子组的元组,从 1 到模式中有多少组I have
FILE = open("file.txt", "r") #long text file
TEXT = FILE.read()
#long identification code with dots (.) and slashes (-)
regex = "process \d\d\d\d\d\d\d\-\d\d\.\d\d\d\d\.\d+\.\d\d\.\d\d\d\d"
SRC = re.findall(regex, TEXT, flags=re.IGNORECASE|re.MULTILINE)
How can I get the text between first char of first occurence SRC[i]
and first char of next ocurrence SRC[i+1]
and so on? Couldn't find any straight forward satisfatory answer...
MORE INFO EDIT:
pattern = 'process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}'
sample_input = "Process 1234567-89.1234.12431242.12.1234 - text title and long text description with no assured pattern Process 2234567-89.1234.12431242.12.1234 : chars and more text Process 3234567-89.1234.12431242.12.1234 - more text process 3234567-89.1234.12431242.12.1234 (...)"
sample_output[0] = "Process 1234567-89.1234.12431242.12.1234 - text title and long text description with no assured pattern "
sample_output[1] = "Process 2234567-89.1234.12431242.12.1234 : chars and more text "
sample_output[2] = "Process 3234567-89.1234.12431242.12.1234 - more text "
sample_output[3] = "process 3234567-89.1234.12431242.12.1234 "
解决方案
You can use this regex:
(Process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*?)(?=Process)|(Process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*)
)
Match information
MATCH 1
1. [0-105] `Process 1234567-89.1234.12431242.12.1234 - text title and long text description with no assured pattern `
MATCH 2
1. [105-168] `Process 2234567-89.1234.12431242.12.1234 : chars and more text `
MATCH 3
1. [168-221] `Process 3234567-89.1234.12431242.12.1234 - more text `
MATCH 4
2. [221-267] `Process 3234567-89.1234.12431242.12.1234 (...)`
You can use this code:
sample_input = "Process 1234567-89.1234.12431242.12.1234 - text title and long text description with no assured pattern Process 2234567-89.1234.12431242.12.1234 : chars and more text Process 3234567-89.1234.12431242.12.1234 - more text process 3234567-89.1234.12431242.12.1234 (...)"
m = re.match(r"(Process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*?)(?=Process)|(Process \d{7}\-\d{2}\.\d{4}\.\d+\.\d{2}\.\d{4}.*)", sample_input)
m.group(1) # The first parenthesized subgroup.
m.groups() # Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern
这篇关于Python:如何在匹配之间获取字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文