如何使用正则表达式按其类删除整个 HTML 标记(及其内容)? [英] How can I remove an entire HTML tag (and its contents) by its class using a regex?

查看:13
本文介绍了如何使用正则表达式按其类删除整个 HTML 标记(及其内容)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不太擅长正则表达式,但我正在学习.

我想通过类名删除一些 html 标签.这是我到目前为止:

第一个 .*?是因为它可能包含其他属性,其次是它可能包含其他 html 内容.

我做错了什么?我试了很多套都没有成功.

更新

在 DIV 中,它可以包含多行,我正在使用 Perl 正则表达式.

解决方案

您还需要在 div 标签中允许在上课之前进行其他事情

]*class="footer"[^>]*>(.*?)

另外,不区分大小写.您可能需要对引号或结束标记中的斜杠等内容进行转义.你是在什么背景下做这件事的?

另请注意,根据输入,使用正则表达式解析 HTML 可能非常麻烦.下面的答案提出了一个好观点 - 假设您有如下结构:

<div class="footer"><div>嗨!</div>

试图为此构建正则表达式是灾难的秘诀.最好的办法是将文档加载到 DOM 中,然后对其进行操作.

应该紧密映射到 XML::DOM 的伪代码:

document =//加载文档divs = document.getElementsByTagName("div");for(div 中的 div){if(div.getAttributes["class"] == "footer") {父 = div.getParent();for(div.getChildren()中的孩子){//过滤属性类型?parent.insertBefore(div, child);}parent.removeChild(div);}}

<小时>这是一个 perl 库,HTML::DOM 和另一个,XML::DOM
.NET 具有处理 dom 解析的内置库.

I am not very good with Regex but I am learning.

I would like to remove some html tag by the class name. This is what I have so far :

<div class="footer".*?>(.*?)</div>

The first .*? is because it might contain other attribute and the second is it might contain other html stuff.

What am I doing wrong? I have try a lot of set without success.

Update

Inside the DIV it can contain multiple line and I am playing with Perl regex.

解决方案

You will also want to allow for other things before class in the div tag

<div[^>]*class="footer"[^>]*>(.*?)</div>

Also, go case-insensitive. You may need to escape things like the quotes, or the slash in the closing tag. What context are you doing this in?

Also note that HTML parsing with regular expressions can be very nasty, depending on the input. A good point is brought up in an answer below - suppose you have a structure like:

<div>
    <div class="footer">
        <div>Hi!</div>
    </div>
</div>

Trying to build a regex for that is a recipe for disaster. Your best bet is to load the document into a DOM, and perform manipulations on that.

Pseudocode that should map closely to XML::DOM:

document = //load document
divs = document.getElementsByTagName("div");
for(div in divs) {
    if(div.getAttributes["class"] == "footer") {
        parent = div.getParent();
        for(child in div.getChildren()) {
            // filter attribute types?
            parent.insertBefore(div, child);
        }
        parent.removeChild(div);
    }
}


Here is a perl library, HTML::DOM, and another, XML::DOM
.NET has built-in libraries to handle dom parsing.

这篇关于如何使用正则表达式按其类删除整个 HTML 标记(及其内容)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
前端开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆