跨新行获取两个字符之间的所有内容 [英] Getting Everything Between Two Characters Across New Lines

查看:53
本文介绍了跨新行获取两个字符之间的所有内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我正在处理的文本示例.

This is a sample of the text I am working with.

6) Jake's Taxi Service 是出租车行业的新成员.它通过在行业中占据独特的地位而取得了成功.Jake's Taxi Service 最有可能是如何做到这一点的?

6) Jake's Taxi Service is a new entrant to the taxi industry. It has achieved success by staking out a unique position in the industry. How did Jake's Taxi Service mostly likely achieve this position?

A) 以高于竞争对手;服务于比竞争对手更大的区域

A) providing long-distance cab fares at a higher rate than competitors; servicing a larger area than competitors

B) 以低于竞争对手的价格提供长途出租车费用;服务于比竞争对手更小的区域

B) providing long-distance cab fares at a lower rate than competitors; servicing a smaller area than competitors

C) 以高于竞争对手;服务于与竞争对手相同的区域

C) providing long-distance cab fares at a higher rate than competitors; servicing the same area as competitors

D) 以低于竞争对手的价格提供长途出租车费用;服务于与竞争对手相同的区域

D) providing long-distance cab fares at a lower rate than competitors; servicing the same area as competitors

答案:D

我正在尝试匹配整个问题,包括答案选项.从问题编号到单词答案的所有内容

这是我当前的正则表达式

This is my current regex expression

((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)

SearchCounter 只是一个与当前问题相对应的变量,在本例中为 6.我认为问题与跨新行搜索有关.

SearchCounter is just a variable that will correspond to the current question, in this case 6. I think the issue is something to do with searching across the new lines.

完整源代码

searchCounter = 1

bookDict = {}

with open ('StratMasterKey.txt', 'rt') as myfile:

    for line in myfile:
        question_pattern = re.compile((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL) 

        result = question_pattern.search(line)
        if result != None: 
            bookDict[searchCounter] = result[0] 
            searchCounter +=1

推荐答案

您的正则表达式失败的原因是您使用 for line in myfile:逐行读取文件>,而您的模式在单个多行字符串中搜索匹配项.

The reason your regex fails is that you read the file line by line with for line in myfile:, while your pattern searches for matches in a single multiline string.

for line in myfile: 替换为 contents = myfile.read() 然后使用 result = question_pattern.search(contents)获取第一个匹配项,或者 result = question_pattern.findall(contents) 获取多个匹配项.

Replace for line in myfile: with contents = myfile.read() and then use result = question_pattern.search(contents) to get the first match, or result = question_pattern.findall(contents) to get multiple matches.

关于正则表达式的说明:我没有修复整个模式,因为正如你提到的,它超出了这个问题的范围,但由于字符串输入现在是一个多行字符串,你需要删除 re.DOTALL 并使用 [\s\S] 匹配模式中的任何字符,使用 . 匹配除换行符以外的任何字符.此外,环视结构是多余的,您可以安全地将 (?=Answer) 替换为 Answer.此外,要检查是否存在匹配,您可以简单地使用 if result: 然后通过访问 result.group() 获取整个匹配值.

A note on the regex: I am not fixing the whole pattern since, as you mentioned, it is out of scope of this question, but since the string input is a multiline string now, you need to remove re.DOTALL and use [\s\S] to match any char in the pattern and . to match any char but a line break char. Also, the lookaround contruct is redundant, you may safely replace (?=Answer) with Answer. Also, to check if there is a match, you may simply use if result: and then grab the whole match value by accessing result.group().

完整代码片段:

with open ('StratMasterKey.txt', 'rt') as myfile:
    contents = myfile.read()
    question_pattern = re.compile((rf'(?<={searchCounter}\) )[\s\S]*?Answer.*')) 
    result = question_pattern.search(contents)
    if result: 
        print( result.group() )

这篇关于跨新行获取两个字符之间的所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆