瓶颈？更有效的正则表达式？ [英] Bottleneck? More efficient regular expression?

查看：82 发布时间：2019/6/5 11:02:26 python

本文介绍了瓶颈？更有效的正则表达式？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您好，

我一直在努力使用正则表达式来解析XML文件，这会一直给出运行时错误最大值

超出递归限制。这是模式字符串：

r''< code>（？P< c>。*？）< / code>。*？< targetSeq

name ="（？P< tn>。*？）">。*？< target>（？P< t>。*？）< / target>。*？< align> ;（？P< a>。*？）< / align>。*？< template>（？P< temp>。*？）< / template>。*？< a

otherTag>（？P< at>。*？）< / anotherTag>。*？< yetAnotherTag>（？P< yat>。*？）< / yetAnotherTag>''

文件格式是直截了当的。以下是一个示例：

< code> 1cg2< / code>

< chain> a< / chain>

< settings> abcde< / settings>

< scoreInfo> 12345< / scoreInfo>

< targetSeq name =" 1onc"> blah

< / targetSeq>

< alignment size =" 335">

< target> WLTFQKKHITNTRDVDCDNIMS< / target>

< align> ：| .. | ：。 | 。 |。。：< / align>

< template> QKRDNVLFQAATDEQPAVIKTLEKL< / template>

< anotherTag> foobarfoobar< / anotherTag>

<还有另一个标签> barfoobarfoo< / yetAnotherTag>

＃这组标签然后在文件中重复多次

如果我搜索模式到< / template> （即没有< anotherTag>开始），它工作正常。一旦我将

后面的位添加到模式中就会出错。

我听说非贪婪（*？）是低效的，所以我试过更换所有。*？使用（？！< target>）等，这意味着如果

下一段文字与< target>不匹配标签继续但它给出了同样的错误。

所以我的问题是：这种模式的瓶颈是什么？ RE中有经验的人可以在这里给出一些提示吗？

非常感谢您的帮助！

Tina

----- =通过Newsfeeds.Com发布，未经审查的Usenet新闻= -----
http://www.newsfeeds.com - 世界排名第一的新闻组服务！

----- ==超过100,000个新闻组 - 19个不同的服务器！ = -----

Hello,

I''ve been struggling with a regular expression for parsing XML files, which keeps giving the run time error "maximum
recursion limit exceeded". Here is the pattern string:

r''<code>(?P<c>.*?)</code>.*?<targetSeq
name="(?P<tn>.*?)">.*?<target>(?P<t>.*?)</target>.*?<align>(?P<a>.*?)</align>.*?<template>(?P<temp>.*?)</template>.*?<an
otherTag>(?P<at>.*?)</anotherTag>.*?<yetAnotherTag>(?P<yat>.*?)</yetAnotherTag>''

The file format is straighforward. Here is a sample:

<code>1cg2</code>
<chain>a</chain>
<settings>abcde</settings>
<scoreInfo>12345</scoreInfo>
<targetSeq name="1onc">blah
</targetSeq>
<alignment size="335">
<target>WLTFQKKHITNTRDVDCDNIMS</target>
<align> :| ..| : . | . |. . :</align>
<template>QKRDNVLFQAATDEQPAVIKTLEKL</template>
<anotherTag>foobarfoobar</anotherTag>
<yetAnotherTag>barfoobarfoo</yetAnotherTag>

# this group of tags then repeat in the file multiple times

If I search for the pattern up to "</template>" (i.e. no <anotherTag> onwards), it works fine. As soon as I added the
later bits into the pattern it gives the error.

I heard that non-greedy (*?) is inefficient, so I tried replacing all .*? with (?!<target>) etc. which means "if the the
next piece of text doesn''t match the <target> tag keep going". But it gives the same error.

So my question is: what is the bottleneck in this pattern? Could someone more experienced in REs give some hints here?

Your help is greatly appreciated!

Tina

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----

推荐答案

Tina Li写道：

Tina Li wrote:

你好，

[跳过]

Hello,

[skipped]

就我个人而言，我宁愿把它分成几个单独的版本。比如

match_tag（tag_name）匹配字符串开头的标签组。

hth，

anton。

Personally I''d rather split it in several separate re''s. Something like
match_tag(tag_name) that matches tag group at the start of the string.

hth,
anton.

文章< 3f ******** @ corp.newsgroups.com>，

" Tina李" < tina_li23 AT hotmail DOT com>写道：

In article <3f********@corp.newsgroups.com>,
"Tina Li" <tina_li23 AT hotmail DOT com> wrote:

我一直在努力使用正则表达式解析XML文件，
不断给出运行时错误超出最大递归限制 ;。

I''ve been struggling with a regular expression for parsing XML files,
which keeps giving the run time error "maximum recursion limit
exceeded".

为什么不使用真正的XML解析器？ xml.parsers.expat很容易使用，

没有递归限制的问题，并且当有人在一个生成有效的XML文件时会继续工作
稍微不同的版本

比你预期的那个。

-

David Eppstein http://www.ics.uci.edu/~eppstein/

大学。加州，欧文，信息学院和计算机科学

Why not use a real XML parser? xml.parsers.expat is easy enough to use,
doesn''t have problems with recursion limits, and will continue working
when someone generates a valid XML file in a slightly different version
than the one you expect.

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

您好，

感谢您的建议。它确实解决了我的问题 - 但出于好奇，我想知道是什么导致了

超限。

谢谢再次！

Tina

" David Eppstein" < EP ****** @ ics.uci.edu>在留言新闻中写道：ep **************************** @ news.service.u ci.edu ...

|在文章< 3f ******** @ corp.newsgroups.com>，

| Tina Li < tina_li23 AT hotmail DOT com>写道：

|

| >我一直在努力使用正则表达式来解析XML文件，

| >它一直给出运行时错误最大递归限制

| >超过。

|

|为什么不使用真正的XML解析器？ xml.parsers.expat很容易使用，

|没有递归限制的问题，并将继续工作

|当有人在略有不同的版本中生成有效的XML文件时

|比你期望的那个。

|

| -

| David Eppstein http://www.ics.uci.edu/~eppstein/

|大学。加州，欧文，信息学院和计算机科学

----- =通过Newsfeeds.Com发布，未经审查的Usenet新闻= -----
http://www.newsfeeds.com - 世界排名第一的新闻组服务！

---- - ==超过100,000个新闻组 - 19个不同的服务器！ = -----

Hello,

Thanks for the suggestion. It does solve my problem -- but just out of curiosity, I''d like to know what caused the
over-limit as well.

Thanks again!

Tina

"David Eppstein" <ep******@ics.uci.edu> wrote in message news:ep****************************@news.service.u ci.edu...
| In article <3f********@corp.newsgroups.com>,
| "Tina Li" <tina_li23 AT hotmail DOT com> wrote:
|
| > I''ve been struggling with a regular expression for parsing XML files,
| > which keeps giving the run time error "maximum recursion limit
| > exceeded".
|
| Why not use a real XML parser? xml.parsers.expat is easy enough to use,
| doesn''t have problems with recursion limits, and will continue working
| when someone generates a valid XML file in a slightly different version
| than the one you expect.
|
| --
| David Eppstein http://www.ics.uci.edu/~eppstein/
| Univ. of California, Irvine, School of Information & Computer Science

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----

这篇关于瓶颈？更有效的正则表达式？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

瓶颈？更有效的正则表达式？ [英] Bottleneck? More efficient regular expression?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

瓶颈？更有效的正则表达式？ [英] Bottleneck? More efficient regular expression?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭