从网页删除所有的HTML标签 [英] Removing all HTML tags from a webpage

查看：195 发布时间：2016/8/3 10:27:53 regex bash sed

本文介绍了从网页删除所有的HTML标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在做与一些BASH shell脚本卷曲。如果我的curl命令返回的任何文字，我知道我有一个错误。通过卷曲返回该文本通常是HTML。我想，如果我可以去掉所有的HTML标签，我可以显示所产生的文本错误信息。

I am doing some BASH shell scripting with curl. If my curl command returns any text, I know I have an error. This text returned by curl is usually in HTML. I figured that if I can strip out all of the HTML tags, I could display the resulting text as an error message.

我在想是这样的：

sed -E 's/<.*?>//g' <<<$output_text

不过，我得到的sed：1：？S /＆LT; *＆GT; //：RE错误：重复的操作员操作无效

如果我更换 *？与 * ，我没有得到错误（我不得到任何文字其一）。如果我删除的全球的（先按g ）标志，我得到了同样的错误。

If I replace *? with *, I don't get the error (and I don't get any text either). If I remove the global (g) flag, I get the same error.

这是Mac OS X上。

This is on Mac OS X.

从网页删除所有的HTML标签 [英] Removing all HTML tags from a webpage

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

从网页删除所有的HTML标签 [英] Removing all HTML tags from a webpage

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭