匹配sed中的任何字符(包括换行符) [英] Match any character (including newlines) in sed

查看:1031
本文介绍了匹配sed中的任何字符(包括换行符)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个sed命令,我想运行在一个巨大的,可怕的,丑陋的HTML文件,从Microsoft Word文档创建。所有它应该做的是删除任何字符串的实例

I have a sed command that I want to run on a huge, terrible, ugly HTML file that was created from a Microsoft Word document. All it should do is remove any instance of the string

style='text-align:center; color:blue;
exampleStyle:exampleValue'

我试图修改的sed命令是

The sed command that I am trying to modify is

sed "s/ style='[^']*'//" fileA > fileB

它工作得很好,除了当匹配的文本内有一个新行,匹配。是否有sed的修饰符,或者我可以做任何字符强制匹配,包括换行符?

It works great, except that whenever there is a new line inside of the matching text, it doesn't match. Is there a modifier for sed, or something I can do to force matching of any character, including newlines?

我理解regexps在XML和HTML是可怕的blah,但在这种情况下,字符串模式形式良好,因为样式属性总是以单个引号开始,以单个引号结束。所以如果我只是解决换行问题,我可以减少超过50%的HTML的大小只有一个命令。

I understand that regexps are terrible at XML and HTML, blah blah blah, but in this case, the string patterns are well-formed in that the style attributes always start with a single quote and end with a single quote. So if I could just solve the newline problem, I could cut down the size of the HTML by over 50% with just that one command.

最后,结果是,SinanÜnür的perl脚本效果最好。这是几乎是瞬间的,它减少了文件大小从2.3 MB到850k。 Good ol'Perl ...

In the end, it turned out that Sinan Ünür's perl script worked best. It was almost instantaneous, and it reduced the file size from 2.3 MB to 850k. Good ol' Perl...

推荐答案

sed 文件一行一行,这意味着,如我所知,你想要的是不可能在 sed

sed goes over the input file line by line which means, as I understand, what you want is not possible in sed.

使用以下Perl脚本(未测试):

You could use the following Perl script (untested), though:

#!/usr/bin/perl

use strict;
use warnings;

{
    local $/; # slurp mode
    my $html = <>;
    $html =~ s/ style='[^']*'//g;
    print $html;
}

__END__

/ p>

A one liner would be:

$ perl -e 'local $/; $_ = <>; s/ style=\047[^\047]*\047//g; print' fileA > fileB

这篇关于匹配sed中的任何字符(包括换行符)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆