瓶颈?更有效的正则表达式? [英] Bottleneck? More efficient regular expression?

查看:82
本文介绍了瓶颈?更有效的正则表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,


我一直在努力使用正则表达式来解析XML文件,这会一直给出运行时错误最大值

超出递归限制。这是模式字符串:


r''< code>(?P< c>。*?)< / code>。*?< targetSeq

name ="(?P< tn>。*?)">。*?< target>(?P< t>。*?)< / target>。*?< align> ;(?P< a>。*?)< / align>。*?< template>(?P< temp>。*?)< / template>。*?< a

otherTag>(?P< at>。*?)< / anotherTag>。*?< yetAnotherTag>(?P< yat>。*?)< / yetAnotherTag>''

文件格式是直截了当的。以下是一个示例:


< code> 1cg2< / code>

< chain> a< / chain>

< settings> abcde< / settings>

< scoreInfo> 12345< / scoreInfo>

< targetSeq name =" 1onc"> blah

< / targetSeq>

< alignment size =" 335">

< target> WLTFQKKHITNTRDVDCDNIMS< / target>

< align> :| .. | :。 | 。 |。 。 :< / align>

< template> QKRDNVLFQAATDEQPAVIKTLEKL< / template>

< anotherTag> foobarfoobar< / anotherTag>

<还有另一个标签> barfoobarfoo< / yetAnotherTag>


#这组标签然后在文件中重复多次


如果我搜索模式到< / template> (即没有< anotherTag>开始),它工作正常。一旦我将

后面的位添加到模式中就会出错。


我听说非贪婪(*?)是低效的,所以我试过更换所有。*?使用(?!< target>)等,这意味着如果

下一段文字与< target>不匹配标签继续但它给出了同样的错误。


所以我的问题是:这种模式的瓶颈是什么? RE中有经验的人可以在这里给出一些提示吗?


非常感谢您的帮助!


Tina



----- =通过Newsfeeds.Com发布,未经审查的Usenet新闻= -----
http://www.newsfeeds.com - 世界排名第一的新闻组服务!

----- ==超过100,000个新闻组 - 19个不同的服务器! = -----

Hello,

I''ve been struggling with a regular expression for parsing XML files, which keeps giving the run time error "maximum
recursion limit exceeded". Here is the pattern string:

r''<code>(?P<c>.*?)</code>.*?<targetSeq
name="(?P<tn>.*?)">.*?<target>(?P<t>.*?)</target>.*?<align>(?P<a>.*?)</align>.*?<template>(?P<temp>.*?)</template>.*?<an
otherTag>(?P<at>.*?)</anotherTag>.*?<yetAnotherTag>(?P<yat>.*?)</yetAnotherTag>''

The file format is straighforward. Here is a sample:

<code>1cg2</code>
<chain>a</chain>
<settings>abcde</settings>
<scoreInfo>12345</scoreInfo>
<targetSeq name="1onc">blah
</targetSeq>
<alignment size="335">
<target>WLTFQKKHITNTRDVDCDNIMS</target>
<align> :| ..| : . | . |. . :</align>
<template>QKRDNVLFQAATDEQPAVIKTLEKL</template>
<anotherTag>foobarfoobar</anotherTag>
<yetAnotherTag>barfoobarfoo</yetAnotherTag>

# this group of tags then repeat in the file multiple times

If I search for the pattern up to "</template>" (i.e. no <anotherTag> onwards), it works fine. As soon as I added the
later bits into the pattern it gives the error.

I heard that non-greedy (*?) is inefficient, so I tried replacing all .*? with (?!<target>) etc. which means "if the the
next piece of text doesn''t match the <target> tag keep going". But it gives the same error.

So my question is: what is the bottleneck in this pattern? Could someone more experienced in REs give some hints here?

Your help is greatly appreciated!

Tina


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----

推荐答案

Tina Li写道:
Tina Li wrote:
你好,

[跳过]
Hello,

[skipped]




就我个人而言,我宁愿把它分成几个单独的版本。比如

match_tag(tag_name)匹配字符串开头的标签组。


hth,

anton。



Personally I''d rather split it in several separate re''s. Something like
match_tag(tag_name) that matches tag group at the start of the string.

hth,
anton.


文章< 3f ******** @ corp.newsgroups.com>,

" Tina李" < tina_li23 AT hotmail DOT com>写道:
In article <3f********@corp.newsgroups.com>,
"Tina Li" <tina_li23 AT hotmail DOT com> wrote:
我一直在努力使用正则表达式解析XML文件,
不断给出运行时错误超出最大递归限制 ;。
I''ve been struggling with a regular expression for parsing XML files,
which keeps giving the run time error "maximum recursion limit
exceeded".




为什么不使用真正的XML解析器? xml.parsers.expat很容易使用,

没有递归限制的问题,并且当有人在一个生成有效的XML文件时会继续工作
稍微不同的版本

比你预期的那个。


-

David Eppstein http://www.ics.uci.edu/~eppstein/

大学。加州,欧文,信息学院和计算机科学



Why not use a real XML parser? xml.parsers.expat is easy enough to use,
doesn''t have problems with recursion limits, and will continue working
when someone generates a valid XML file in a slightly different version
than the one you expect.

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science


您好,


感谢您的建议。它确实解决了我的问题 - 但出于好奇,我想知道是什么导致了

超限。


谢谢再次!


Tina


" David Eppstein" < EP ****** @ ics.uci.edu>在留言新闻中写道:ep **************************** @ news.service.u ci.edu ...

|在文章< 3f ******** @ corp.newsgroups.com>,

| Tina Li < tina_li23 AT hotmail DOT com>写道:

|

| >我一直在努力使用正则表达式来解析XML文件,

| >它一直给出运行时错误最大递归限制

| >超过。

|

|为什么不使用真正的XML解析器? xml.parsers.expat很容易使用,

|没有递归限制的问题,并将继续工作

|当有人在略有不同的版本中生成有效的XML文件时

|比你期望的那个。

|

| -

| David Eppstein http://www.ics.uci.edu/~eppstein/

|大学。加州,欧文,信息学院和计算机科学


----- =通过Newsfeeds.Com发布,未经审查的Usenet新闻= -----
http://www.newsfeeds.com - 世界排名第一的新闻组服务!

---- - ==超过100,000个新闻组 - 19个不同的服务器! = -----
Hello,

Thanks for the suggestion. It does solve my problem -- but just out of curiosity, I''d like to know what caused the
over-limit as well.

Thanks again!

Tina

"David Eppstein" <ep******@ics.uci.edu> wrote in message news:ep****************************@news.service.u ci.edu...
| In article <3f********@corp.newsgroups.com>,
| "Tina Li" <tina_li23 AT hotmail DOT com> wrote:
|
| > I''ve been struggling with a regular expression for parsing XML files,
| > which keeps giving the run time error "maximum recursion limit
| > exceeded".
|
| Why not use a real XML parser? xml.parsers.expat is easy enough to use,
| doesn''t have problems with recursion limits, and will continue working
| when someone generates a valid XML file in a slightly different version
| than the one you expect.
|
| --
| David Eppstein http://www.ics.uci.edu/~eppstein/
| Univ. of California, Irvine, School of Information & Computer Science

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----


这篇关于瓶颈?更有效的正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆