匹配 sed 中的任何字符(包括换行符) [英] Match any character (including newlines) in sed

查看:52
本文介绍了匹配 sed 中的任何字符(包括换行符)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 sed 命令,我想在一个巨大的、可怕的、丑陋的 HTML 文件上运行它,该文件是从 Microsoft Word 文档创建的.它应该做的就是删除字符串的任何实例

I have a sed command that I want to run on a huge, terrible, ugly HTML file that was created from a Microsoft Word document. All it should do is remove any instance of the string

style='text-align:center; color:blue;
exampleStyle:exampleValue'

我试图修改的 sed 命令是

The sed command that I am trying to modify is

sed "s/ style='[^']*'//" fileA > fileB

它工作得很好,除了每当匹配文本中有一个新行时,它都不匹配.sed 是否有修饰符,或者我可以做些什么来强制匹配任何字符,包括换行符?

It works great, except that whenever there is a new line inside of the matching text, it doesn't match. Is there a modifier for sed, or something I can do to force matching of any character, including newlines?

我知道正则表达式在 XML 和 HTML 方面很糟糕,等等,但在这种情况下,字符串模式是格式良好的,因为样式属性总是以单引号开头并以单引号结尾.因此,如果我能解决换行符问题,我就可以通过这条命令将 HTML 的大小减少 50% 以上.

I understand that regexps are terrible at XML and HTML, blah blah blah, but in this case, the string patterns are well-formed in that the style attributes always start with a single quote and end with a single quote. So if I could just solve the newline problem, I could cut down the size of the HTML by over 50% with just that one command.

最后,结果证明思南 Ünür 的 perl 脚本效果最好.它几乎是瞬间完成的,并将文件大小从 2.3 MB 减少到 850k.好的 ol' Perl...

In the end, it turned out that Sinan Ünür's perl script worked best. It was almost instantaneous, and it reduced the file size from 2.3 MB to 850k. Good ol' Perl...

推荐答案

sed 逐行检查输入文件,这意味着,据我所知,您想要的在 中是不可能的sed.

sed goes over the input file line by line which means, as I understand, what you want is not possible in sed.

不过,您可以使用以下 Perl 脚本(未经测试):

You could use the following Perl script (untested), though:

#!/usr/bin/perl

use strict;
use warnings;

{
    local $/; # slurp mode
    my $html = <>;
    $html =~ s/ style='[^']*'//g;
    print $html;
}

__END__

单衬将是:

$ perl -e 'local $/; $_ = <>; s/ style=47[^47]*47//g; print' fileA > fileB

这篇关于匹配 sed 中的任何字符(包括换行符)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆