清除html / xml<标签>的最简单方法从单线输出 [英] Easiest method for removing html/xml <tags> from single-line output

查看：178 发布时间：2018/6/22 20:17:31 html xml sed

本文介绍了清除html / xml<标签>的最简单方法从单线输出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

 < words> Http：/ /www.path.com/words</words>

我试过使用...

  sed's /<。*> //'

...删除标签，但这只会破坏整条线路。我不确定为什么发生这种情况，因为每个'<'在到达内容之前都用'>'关闭。

什么是最简单的方法做这个？

谢谢！

解决方案

/ p>

  sed's /<。*> \（。* \）< \ /.*& gt; / \ 1 /'

表达式快速分类

 <。*> - 匹配第一个标记
 \（。* \） - 匹配并保存标记之间的文本
 <\ /.*& gt; - 匹配结束标签，确保转义/字符
 \1  - 输出第一个保存的匹配结果
  - （在\（和\）之间匹配的文本）

/ b>
$ b 在评论中出现了一个问题，这个问题应该是完整的。

\（和 \）是Sed的反向参考标记。例如，如果我们有一个输入字符串：

blockquote>

这里有（parens）。此外，我们可以使用类似parens的这种方法，可以使用反向引用
。

我们开发了一个表达式：
sed s /.*（\（。* \））。* \1\\（。* \）\ 1. * / \ 1 \ 2 /
这给了我们：
parens like this
让我们分解表达式以找出答案。 p>

sed s / - 这是sed表达式的开始标记。。* - 匹配任何字符开始（以及没有）。（ - 匹配一个字面左括号字符 \（。* \） - 匹配任何字符并保存为反向引用。在这种情况下，它将匹配第一个开始和最后一个在表达式中关闭括号） - 匹配文字右括号字符。。* - 与上述相同。 \1 - 匹配第一个保存的反向引用。在我们的示例中，这是用`parens` \（。* \）填充的 - 与上面相同。 \1 - 同上。 / - 匹配表达式结束。信号转换到输出表达式。 \1 \2 - 打印我们的两个后退引用。 / - 输出表达式结束。
我们可以看到，从括号（ （和））被替换回匹配表达式中，以匹配字符串 parens 。

I have output from grep I'm trying to clean up that looks like:
<words>Http://www.path.com/words</words>
I've tried using...
sed 's/<.*>//'
...to remove the tags, but that just destroys the entire line. I'm not sure why that's happening, since every '<' is closed with a '>' before it gets to the content.

What is the easiest way to do this?

Thanks!
解决方案
Try this for your sed expression:
sed 's/<.*>$.*$<\/.*>/\1/'
Quick breakdown of the expression:
<.*> - Match the first tag $.*$ - Match and save the text between the tags <\/.*> - Match the end tag making sure to escape the / character \1 - Output the result of the first saved match - (the text that is matched between $ and $)

More about back-references

A question came up in the comments that should probably be addressed for completeness.

The $ and $ are Sed's back-reference markers. They save a portion of the matched expression for use later.

For example, if we have an input string:

This has (parens) in it. In addition we can use parenslike thisparens using back-references.

We develop an expression:
sed s/.*($.*$).*\1\$.*$\1.*/\1 \2/
Which gives us:
parens like this
How the heck did that work? Let's break down the expression to find out.

Expression breakdown:
sed s/ - This is the opening tag to a sed expression. .* - Match any character to start (as well as nothing). ( - Match a literal left parenthesis character. $.*$ - Match any character and save as a back-reference. In this case it will match anything between the first open and last close parenthesis in the expression. ) - Match a literal right parenthesis character. .* - Same as above. \1 - Match the first saved back-reference. In the case of our sample this is filled in with `parens` $.*$ - Same as above. \1 - Same as above. / - End of the match expression. Signals transition to the output expression. \1 \2 - Print our two back-references. / - End of output expression.
As we can see, the back-reference taken from between the parenthesis (( and )) was substituted back into the matching expression to be able to match the string parens.

这篇关于清除html / xml<标签>的最简单方法从单线输出的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

清除html / xml<标签>的最简单方法从单线输出 [英] Easiest method for removing html/xml <tags> from single-line output

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

清除html / xml&lt;标签&gt;的最简单方法从单线输出 [英] Easiest method for removing html/xml &lt;tags&gt; from single-line output

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

清除html / xml<标签>的最简单方法从单线输出 [英] Easiest method for removing html/xml <tags> from single-line output

登录关闭