跨新行获取两个字符之间的所有内容 [英] Getting Everything Between Two Characters Across New Lines
问题描述
这是我正在处理的文本示例.
This is a sample of the text I am working with.
6) Jake's Taxi Service 是出租车行业的新成员.它通过在行业中占据独特的地位而取得了成功.Jake's Taxi Service 最有可能是如何做到这一点的?
6) Jake's Taxi Service is a new entrant to the taxi industry. It has achieved success by staking out a unique position in the industry. How did Jake's Taxi Service mostly likely achieve this position?
A) 以高于竞争对手;服务于比竞争对手更大的区域
A) providing long-distance cab fares at a higher rate than competitors; servicing a larger area than competitors
B) 以低于竞争对手的价格提供长途出租车费用;服务于比竞争对手更小的区域
B) providing long-distance cab fares at a lower rate than competitors; servicing a smaller area than competitors
C) 以高于竞争对手;服务于与竞争对手相同的区域
C) providing long-distance cab fares at a higher rate than competitors; servicing the same area as competitors
D) 以低于竞争对手的价格提供长途出租车费用;服务于与竞争对手相同的区域
D) providing long-distance cab fares at a lower rate than competitors; servicing the same area as competitors
答案:D
我正在尝试匹配整个问题,包括答案选项.从问题编号到单词答案的所有内容
这是我当前的正则表达式
This is my current regex expression
((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)
SearchCounter 只是一个与当前问题相对应的变量,在本例中为 6.我认为问题与跨新行搜索有关.
SearchCounter is just a variable that will correspond to the current question, in this case 6. I think the issue is something to do with searching across the new lines.
完整源代码
searchCounter = 1
bookDict = {}
with open ('StratMasterKey.txt', 'rt') as myfile:
for line in myfile:
question_pattern = re.compile((rf'(?<={searchCounter}\) ).*?(?=Answer).*'), re.DOTALL)
result = question_pattern.search(line)
if result != None:
bookDict[searchCounter] = result[0]
searchCounter +=1
推荐答案
您的正则表达式失败的原因是您使用 for line in myfile:
逐行读取文件>,而您的模式在单个多行字符串中搜索匹配项.
The reason your regex fails is that you read the file line by line with for line in myfile:
, while your pattern searches for matches in a single multiline string.
将 for line in myfile:
替换为 contents = myfile.read()
然后使用 result = question_pattern.search(contents)
获取第一个匹配项,或者 result = question_pattern.findall(contents)
获取多个匹配项.
Replace for line in myfile:
with contents = myfile.read()
and then use result = question_pattern.search(contents)
to get the first match, or result = question_pattern.findall(contents)
to get multiple matches.
关于正则表达式的说明:我没有修复整个模式,因为正如你提到的,它超出了这个问题的范围,但由于字符串输入现在是一个多行字符串,你需要删除 re.DOTALL
并使用 [\s\S]
匹配模式中的任何字符,使用 .
匹配除换行符以外的任何字符.此外,环视结构是多余的,您可以安全地将 (?=Answer)
替换为 Answer
.此外,要检查是否存在匹配,您可以简单地使用 if result:
然后通过访问 result.group()
获取整个匹配值.
A note on the regex: I am not fixing the whole pattern since, as you mentioned, it is out of scope of this question, but since the string input is a multiline string now, you need to remove re.DOTALL
and use [\s\S]
to match any char in the pattern and .
to match any char but a line break char. Also, the lookaround contruct is redundant, you may safely replace (?=Answer)
with Answer
. Also, to check if there is a match, you may simply use if result:
and then grab the whole match value by accessing result.group()
.
完整代码片段:
with open ('StratMasterKey.txt', 'rt') as myfile:
contents = myfile.read()
question_pattern = re.compile((rf'(?<={searchCounter}\) )[\s\S]*?Answer.*'))
result = question_pattern.search(contents)
if result:
print( result.group() )
这篇关于跨新行获取两个字符之间的所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!