如何通过搜索和替换验证大量文件? [英] How can I validate large numbers of files with search and replace?
问题描述
我目前正在验证客户端的HTML源代码,并且对于没有Omittag的图像和输入文件,我收到了很多验证错误。我会手动做,但这个客户端字面上有成千上万的文件,有很多的情况下没有。
这个客户端已经验证了一些img标签(无论出于何种原因)。
只是想知道是否有一个unix命令可以运行,以检查是否没有Omittag来添加它。
我已经完成了简单搜索,并用以下命令替换:
find。 \! -path'* .svn *'-type f -exec sed -i -n'1h; 1!H; $ {; g; s /< b> /< strong> / g; p}'{} \\ \\;
但从来没有这么大的东西。任何帮助,将不胜感激。
请参阅我在顶部的评论问。
假设您使用的是GNU sed,并且您正试图将< img />
和< input />
,然后替换命令中的sed表达式这一点,它应该这样做:'1h; 1!H; $ {; g; s / \(img \ | input \)\([^>] * [^ /] \)> / \ 1 \ 2 \ /> / g; p;}'
这里是一个简单的测试文件(SO的着色器做了很奇怪的事情):
$ cat test.html
这是< img标签>没有关闭斜线。
这是< img tag />结束斜线。
这是<输入标签>没有关闭斜线。
并且这里一个< input attrib =1
>跨越多条线。
最后一个< input
attrib =1/>结束斜线。
$ sed -n'1h; 1!H; $ {; g; s / \(img\ | input\)\([^>] * [^ /] \ 1 \ 2 \ /> / g; p;}'test.html
这是< img tag />没有关闭斜线。
这是< img tag />结束斜线。
这是一个< input tag />没有关闭斜线。
这里有一个< input attrib =1
/>跨越多条线。
最后一个< input
attrib =1/>结束斜线。
以下是 GNU sed正则表达式语法和缓冲如何工作以进行多行搜索/替换。 可以使用 Tidy 之类的东西来清理不良的HTML - 这就是我要做的事情比一些简单的搜索/替换更复杂。 Tidy的选项很快就会变得复杂,所以最好用选择的脚本语言(Python,Perl)编写脚本,它调用 libtidy
并设置所需的任何选项。
I am currently validating a client's HTML Source and I am getting a lot of validation errors for images and input files which do not have the Omittag. I would do it manually but this client literally has thousands of files, with a lot of instances where the is not .
This client has validated some img tags (for whatever reason).
Just wondering if there is a unix command I could run to check to see if the does not have a Omittag to add it.
I have done simple search and replaces with the following command:
find . \! -path '*.svn*' -type f -exec sed -i -n '1h;1!H;${;g;s/<b>/<strong>/g;p}' {} \;
But never something this large. Any help would be appreciated.
See questions I asked in comment at top.
Assuming you're using GNU sed, and that you're trying to add the trailing /
to your tags to make XML-compliant <img />
and <input />
, then replace the sed expression in your command with this one, and it should do the trick: '1h;1!H;${;g;s/\(img\|input\)\( [^>]*[^/]\)>/\1\2\/>/g;p;}'
Here it is on a simple test file (SO's colorizer doing wacky things):
$ cat test.html
This is an <img tag> without closing slash.
Here is an <img tag /> with closing slash.
This is an <input tag > without closing slash.
And here one <input attrib="1"
> that spans multiple lines.
Finally one <input
attrib="1" /> with closing slash.
$ sed -n '1h;1!H;${;g;s/\(img\|input\)\( [^>]*[^/]\)>/\1\2\/>/g;p;}' test.html
This is an <img tag/> without closing slash.
Here is an <img tag /> with closing slash.
This is an <input tag /> without closing slash.
And here one <input attrib="1"
/> that spans multiple lines.
Finally one <input
attrib="1" /> with closing slash.
Here's GNU sed regex syntax and how the buffering works to do multiline search/replace.
Alternately you could use something like Tidy that's designed for sanitizing bad HTML -- that's what I'd do if I were doing anything more complicated than a couple of simple search/replaces. Tidy's options get complicated fast, so it's usually better to write a script in your scripting language of choice (Python, Perl) that calls libtidy
and sets whatever options you need.
这篇关于如何通过搜索和替换验证大量文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!