删除/替换bash中的html标签 [英] Remove/replace html tags in bash
问题描述
我有一个包含以下行的文件:
I have a file with lines that contain:
<li><b> Some Text:</b> More Text </li>
我想删除html标记并将</b>
标记替换为破折号,使它变成这样:
I want to remove the html tags and replace the </b>
tag with a dash so it becomes like this:
Some Text:- More Text
我正在尝试使用sed,但是找不到正确的正则表达式组合.
I'm trying to use sed however I can't find the proper regex combination.
推荐答案
如果您确实要剥离所有HTML标记,但同时 only 用sed
命令链接在一起:
If you strictly want to strip all HTML tags, but at the same time only replace the </b>
tag with a -
, you can chain two simple sed
commands with a pipe:
cat your_file | sed 's|</b>|-|g' | sed 's|<[^>]*>||g' > stripped_file
这会将所有文件的内容传递给第一个sed
命令,该命令将处理将</b>
替换为-
.然后,其输出将通过管道传递到sed
,该sed
将用空字符串替换所有HTML标记.最终输出将保存到新文件stripped_file
.
This will pass all the file's contents to the first sed
command that will handle replacing the </b>
to a -
. Then, the output of that will be piped to a sed
that will replace all HTML tags with empty strings. The final output will be saved into the new file stripped_file
.
使用与@Steve的其他答案类似的方法,您还可以使用sed
的-e
选项将表达式链接成单个(非管道命令);通过添加-i
,您还可以读入并替换原始文件的内容,而无需cat
或新文件:
Using a similar method as the other answer from @Steve, you could also use sed
's -e
option to chain expressions into a single (non-piped command); by adding -i
, you can also read-in and replace the contents of your original file without the need for cat
, or a new file:
sed -i -e 's|</b>|-|g' -e 's|<[^>]*>||g' your_file
这将像上面的链接命令一样进行替换,但是这次它将直接替换输入文件中的内容.要保存为新文件,请删除-i
并将> stripped_file
添加到末尾(或您选择的任何文件名).
This will do the replacement just as the chained-command above, however this time it will directly replace the contents in the input file. To save to a new file instead, remove the -i
and add > stripped_file
to the end (or whatever file-name you choose).
这篇关于删除/替换bash中的html标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!