提取两个标记之间的所有子字符串 [英] Extract all substrings between two markers
本文介绍了提取两个标记之间的所有子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个字符串:
mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
我想要的是标记start="&maker1"
和end="/\n"
之间的子字符串列表.因此,预期结果是:
What I want is a list of substrings between the markers start="&maker1"
and end="/\n"
. Thus, the expected result is:
whatIwant = ["The String that I want", "Another string that I want"]
我在这里阅读了答案:
- Find string between two substrings [duplicate]
- How to extract the substring between two markers?
并尝试了此尝试,但未成功
And tried this but not successfully,
>>> import re
>>> mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
>>> whatIwant = re.search("&marker1(.*)/\n", mystr)
>>> whatIwant.group(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
我该怎么做才能解决此问题?而且,我的字符串很长
What could I do to resolve this? Also, I have a very long string
>>> len(myactualstring)
7792818
推荐答案
该如何解决? 我会的:
import re
mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
found = re.findall(r"\&marker1\n(.*?)/\n", mystr)
print(found)
输出:
['The String that I want ', 'Another string that I want ']
请注意:
-
如果需要文字&,
-
&
在re
模式中具有特殊含义.您需要对其进行转义(\&
) -
.
匹配除换行符之外的所有内容
如果只想要匹配的子字符串列表而不是 -
findall
更适合选择 -
*?
是非贪婪的,在这种情况下.*
也可以工作,因为.
与换行符不匹配,但是在其他情况下,匹配结束可能会超出您的期望 - 我使用了所谓的raw-string(r前缀)使转义变得更容易
search
,则&
has special meaning inre
patterns, if you want literal & you need to escape it (\&
).
does match anything except newlinesfindall
is better suited choiced if you just want list of matched substrings, rather thansearch
*?
is non-greedy, in this case.*
would work too, because.
do not match newline, but in other cases you might ending matching more than you wish- I used so-called raw-string (r-prefixed) to make escaping easier
阅读模块re
文档讨论原始字符串的用法以及具有特殊含义的隐式字符列表.
Read module re
documentation for discussion of raw-string usage and implicit list of characters with special meaning.
这篇关于提取两个标记之间的所有子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文