Linux文本文件操作 [英] Linux Text File Manipulation

查看：92 发布时间：2016/7/28 16:54:23 linux text awk sed

本文介绍了Linux文本文件操作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有格式的文件：

<a href="http://www.wowhead.com/?search=Superior Mana Oil">  
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">  
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">  
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">

我需要选择=之后，但在文本中，并在该行的末尾打印此，将使其成为例如：

I need to select the text after the = but before the " and print this at the end of the line, adding so it becomes for example:

<a href="http://www.wowhead.com/?search=Superior Mana Oil">Superior Mana Oil</a>  
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">Tabard of Brute Force</a>  
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">Tabard of the   Wyrmrest Accord</a>  
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">Tattered Hexcloth Sack</a>

我不知道的通过Linux命令行做到这一点的最佳方式（我猜大概的sed / awk的，但是不与他们好），将理想像一个剧本，我可以只给文件名如./fixlink.sh brokenlinks.txt

I'm not sure of the best way to do this via linux command line (I guess probably sed/awk but not good with them), would ideally like a script I can just feed the filename e.g. ./fixlink.sh brokenlinks.txt

推荐答案

假设你可以有一个或AFER更多空间＆LT; A ，和周围的零个或更多的空间 = 标志，下面应该工作：

Assuming you can have one or more space afer <a, and zero or more space around the = signs, the following should work:

$ cat in.txt
<a href="http://www.wowhead.com/?search=Superior Mana Oil">
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">
#
# The command to do the substitution
#
$ sed -e 's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search[ \t]*=[ \t]*\([^"]*\)">#&\1</a>#' in.txt
<a href="http://www.wowhead.com/?search=Superior Mana Oil">Superior Mana Oil</a>
<a href="http://www.wowhead.com/?search=Tabard of Brute Force">Tabard of Brute Force</a>
<a href="http://www.wowhead.com/?search=Tabard of the Wyrmrest Accord">Tabard of the Wyrmrest Accord</a>
<a href="http://www.wowhead.com/?search=Tattered Hexcloth Sack">Tattered Hexcloth Sack</a>

如果你确定你没有多余的空间，模式简化为：

If you're sure you don't have the extra spaces, the pattern simplifies to:

s#<a href=".*search=\([^"]*\)">#&\1</a>#

在 SED ，取值后跟任意字符（＃在这种情况下）开始替换。被替换的模式，直到同一性质的第二次亮相。所以，在我们的第二个例子，要被替换的模式是：＆LT; A HREF =（[* \\＆GT; *搜索= \\ ^]）。我用 \\（[^] * \\）来的意思是，非任何序列 - 字符，并保存它的反向引用 \\ 1 （即 \\（\\）对表示反向引用），最后，下一个标记被分隔＃是替换＆放大器; 在 SED 表示任何匹配，在这种情况下是整条生产线，而 \\ 1 只是匹配的链接文本。

In sed, s followed by any character (# in this case) starts substitution. The pattern to be substituted is until the second appearance of the same character. So, in our second example, the pattern to be substituted is: <a href=".*search=\([^"]*\)">. I used \([^"]*\) to mean, any sequence of non-" characters, and saved it in backreference \1 (the \(\) pair denotes a backreference). Finally, the next token delimited by # is the replacement. & in sed stands for "whatever matched", which in this case is the whole line, and \1 just matches the link text.

这里的样式再次：

's#<a[ \t][ \t]*href[ \t]*=[ \t]*".*search[ \t]*=[ \t]*\([^"]*\)">#&\1</a>#'

及其说明：

'                       quote so as to avoid shell interpreting the characters
s                       substitute
#                       delimiter
<a[ \t][ \t]*           <a followed by one or more whitespace
href[ \t][ \t]*=[ \t]*  href followed by optional space, = followed by optional space
".*search[ \t]*=[ \t]*  " followed by as many characters as needed, followed by
                        search, optional space, =, followed by optional space
\([^"]*\)               a sequence of non-" characters, saved in \1
">                      followed by ">
#                       delimiter, replacement pattern starts
&\1                     the matched pattern, followed by backreference \1.
</a>                    end the </a> tag
#                       end delimiter
'                       end quote

如果你的真正的肯定总是会有搜索= 其次是你想要的，你可以做文字：

If you're really sure that there will always be search= followed by the text you want, you can do:

$ sed -e 's#.*search=\(.*\)">#&\1</a>#'

希望有所帮助。

这篇关于Linux文本文件操作的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Linux文本文件操作 [英] Linux Text File Manipulation

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

Linux文本文件操作 [英] Linux Text File Manipulation

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭