SED删除HTML文件标签 [英] Sed remove tags from html file

查看：865 发布时间：2016/8/2 14:11:46 html regex linux bash

本文介绍了SED删除HTML文件标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要使用sed命令从一个bash脚本一个HTML去除所有标签。
我试着用这种

I need to remove all tags from a html with a bash script using the sed command. I tried with this

sed -r 's/[\<][\/]?[a-zA-Z0-9\=\"\-\#\.\& ]+[\/]?[\>]//g' $1

和蒙山这种

sed -r 's/[\<][\/]?[.]*[\/]?[\\]?[\>]//g' $1

但我还是错过了什么，有什么建议？

but I still miss something, any suggestions??

推荐答案

您既可以使用其中之一的 HTML文本转换器，使用Perl的正则表达式，如果可能的＆LT; + GT; 或者如果它必须是 SED 使用＆LT; [^＆GT;] *＆GT;

You can either use one of the many HTML to text converters, use Perl regex if possible <.+?> or if it must be sed use <[^>]*>

sed -e 's/<[^>]*>//g' file.html

如果没有空间的错误，使用HTML解析器来代替。

If there's no room for errors, use an HTML parser instead.

这经常EX pression由三部分组成＆LT; ， [^＆GT;] * ， ＆GT;

This regular expression consists of three parts <, [^>]*, >

搜索打开的＆LT;

后跟零个或多个字符 * ，这是不是结束＆GT; 结果
[...] 是一个字符类，当它以 ^ 查找字符的不的在类

终于查找结束＆GT;

search for opening <
followed by zero or more characters *, which are not the closing >
[...] is a character class, when it starts with ^ look for characters not in the class
and finally look for closing >

在简单的正前pression ＆LT; *＆GT; 将无法正常工作，因为它搜索最长可能匹配，也就是说，最后收＆GT; 在输入线。例如，当你在输入行有多个标签

The simpler regular expression <.*> will not work, because it searches for the longest possible match, i.e. the last closing > in an input line. E.g., when you have more than one tag in an input line

<name>Olaf</name> answers questions.

将导致

回答问题。

而不是

奥拉夫回答问题。

参见星和另外，尤其是部分的重复： //www.regular-ex$p$pssions.info/repeat.html#greedy\">Watch出于对贪婪！并之后，进行了详细的解释。

See also Repetition with Star and Plus, especially section Watch Out for The Greediness! and following, for a detailed explanation.

这篇关于SED删除HTML文件标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SED删除HTML文件标签 [英] Sed remove tags from html file

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

SED删除HTML文件标签 [英] Sed remove tags from html file

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭