preg_replace与DOMDocument replaceChild [英] preg_replace vs DOMDocument replaceChild

查看:91
本文介绍了preg_replace与DOMDocument replaceChild的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道标题中提到的哪种方法可以更有效地替换html页面中的内容。

I was wondering which method mentioned in the title is more efficient to replace content in a html page.

我的页面中有以下自定义标签:< includes module ='footer'/> 将会替换为某些内容。

I have this custom tag in my page: <includes module='footer'/> which will be replaced with some content.

现在使用 DOMDocument-> getElementsByTagName('includes')-> item(0)-> parentNode- > replaceChild 例如,当我忘记在标记中添加斜杠时,例如< includes module ='footer'> 网站当机。

Now there are some downsides with using DOMDocument->getElementsByTagName('includes')->item(0)->parentNode->replaceChild for instance when i forgot to add the slash in the tag, like so <includes module='footer'> the whole site crashes.

正则表达式只要符合规则,就允许此类异常。它甚至允许我替换任何字符串,例如 {includes:footer}

Regex allows exceptions like these, as long it matches the rule. It would even allow me to replace any string, like {includes:footer}.

现在回到我的实际问题。使用正则表达式是否有不利之处,例如性能问题...?

Now back to my actual question. Are there any downsides using regex for this purpose, like performance issues...?

此处更多:使用XML操作将子项/元素添加到头部

欢呼声

推荐答案

在这里我不会太担心性能,我会认为它们可比。需要运行基准来真正确定这一点,因为这取决于文档的大小以及正则表达式的编写方式。

I wouldn't be too worried about performance here, I would consider them "comparable". Benchmarks would need to be ran to truly determine this, as it would depend on the size of the document and how the regular expression is written.

我会担心关于准确性。通常, DOMDocument 在解析XML方面要好得多,因为它是为读取和理解语言而构建的。但是,它确实在< includes module ='footer'> 上失败,因为它是一个未关闭的标签(期望:< / includes> ; )。

Instead, I would be concerned about accuracy. In general DOMDocument will be much better at parsing XML since it was built to read and understand the language. However, it does fail on <includes module='footer'> because it is an un-closed tag (expecting: </includes>).

大多数常见的HTML / XML格式设置问题都可以通过PHP的 Tidy 类。我会检查一下,因为您会收到更多预期结果与使用regex 进行解析相比。如果使用正则表达式,从技术上讲,模块之前/之后的属性, includes 元素内的元素,意外的字符,例如< includes module ='foo> bar'> 等。

Most common HTML/XML formatting issues can be fixed with PHP's Tidy class. I would check this out, since you should receive much more "expected results" compared to if you used regex to parse. If you used a regular expression, there could technically be attributes before/after the module, elements within the includes element, unexpected characters like <includes module='foo>bar'>, etc.

如果您的XML处于受控环境中(即,您知道会发生什么和不会发生什么,您知道 module 可能包含的字符,那么您将始终知道成为现在包含子代的自闭元素,等等)绝对不要使用正则表达式。只知道它正在寻找一组非常特定的规则。但是,如果您希望此方法可以与您扔给它的任何东西一起使用,请使用DOM解析器(在 Tidy 之后,以避免出现异常),无论性能(尽管我敢打赌,在很多情况下它都非常可比)。

In the end, if your XML is in a "controlled" environment (i.e. you know what can and can't happen, you know what possible characters module will contain, you know that it will always be a self closing element containing now children, etc.) than by all means use a regular expression. Just know it is looking for a very specific set of rules. However, if you expect for this to work with "anything you throw at it"..please use a DOM parser (after Tidy'ing to avoid the exceptions), regardless of performance (although I bet it will be very comparable in many instances).

此外,最后一点,如果您打算在文档中查找/替换/操作多个节点,通过使用DOM解析器,您将看到性能的大幅提高。 DOM解析器将获取一个文档并将其解析一次。然后,您只需遍历已经加载到其类中的数据。这与使用正则表达式进行比较,在正则表达式中,每个人都将遍历整个文档以查找一组匹配项。

Also, final note, if you plan to find/replace/manipulate many nodes in a document, you will see a large performance increase by going with a DOM parser. A DOM parser will take a document and parse it, once. Then you just traverse the data it already has loaded into its class. This is compared to using regular expressions, where each individual one will be ran across the whole document looking for a set of matches.

如果您想让我更详细地说明任何区域(例如给出 Tidy 示例,或在基准测试上的工作),让我知道。

If you want me to get more specific in any area (i.e. give a Tidy example, or work on a benchmark), let me know.

这篇关于preg_replace与DOMDocument replaceChild的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆