PHP:忽略html标记之间的内容时进行正则表达式替换 [英] PHP: Regex replace while ignoring content between html tags

查看:60
本文介绍了PHP:忽略html标记之间的内容时进行正则表达式替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个可以在html标签之间找到单词或正则表达式字符串的正则表达式字符串.

I'm looking for a regular expressions string that can find a word or regex string NOT between html tags.

说我要替换(alpha | beta):希腊字母中的前两个字母是 alpha < b> beta</b>

Say I want to replace (alpha|beta) in: the first two letters in the greek alphabet are alpha and <b>beta</b>

我只希望它替换alpha,因为beta在<>标记之间.因此,忽略(<(.*?)>(.*?)< \/(.*?)>)

I only want it to replace alpha, because beta is between <> tags. So ignore (<(.*?)>(.*?)<\/(.*?)>)

:)

推荐答案

我没有测试此页面中使用的逻辑-

I didn't test the logic used in this page - http://www.phpro.org/examples/Get-Text-Between-Tags.html But I can confirm the logical point made at the top of the page in big bold letters that says you shouldn't do what you're trying to do with regex.

HTML不是统一的,如果在任何现实情况下使用正则表达式来处理这些标记的内容,边缘情况总是会在后面咬你.因此,除非您的标记极其简单,统一,100%准确,仅包含html(不包括CSS,javascript或垃圾),否则最好的选择是dom解析器库.

Html is not uniform and edge cases will always bite you in the rear if you use regular expressions to handle the content of those tags in any real world situation. So unless your markup is extremely simplistic, uniform, 100% accurate, only contains html (not css, javascript or garbage) then your best bet is a dom parser library.

确实很多dom解析器库也有问题,但是您将比regex同行领先.获取标签文本竞争的最佳方法是在浏览器中呈现html并访问给定dom节点的innerText属性(或进行人工复制并手动粘贴内容)-但这并不总是一种选择:D

And really many dom parser libraries have problems too but you'll be miles ahead of the regex counterparts. The best way to get the text contet of tags is to render the html in a browser and access the innerText property of the given dom node (or have a human copy and paste the contents out manually) - but that isn't always an option :D

这篇关于PHP:忽略html标记之间的内容时进行正则表达式替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆