从字符串中提取子串与多个特殊字符使用桑达 [英] Extracting Substring from String with Multiple Special Characters Using Sed

查看:172
本文介绍了从字符串中提取子串与多个特殊字符使用桑达的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样一行文本文件:

 < D​​IV ID =page_footer>< D​​IV>&LT ;?打印(任何短语中的人物可以去这里!'); ?>< / DIV>< / DIV>

和我想用 SED AWK 来提取上述单引号之间的字符串,因此只是打印...

 任何短语中的人物可以去这里!

我要分隔那句我已经在上面,单引号后开始,并在单引号后面紧跟一个括号,然后分号结尾。与捕获组以下sed命令似乎并没有对我的工作。建议?

  sed的'/ ^< D​​IV ID =page_footer>< D​​IV>&LT ;?打印(\\'\\(\\ + \\)\\'); /秒// \\ 1 / P'/home/foobar/testfile.txt


解决方案

错误将使用切割仿

 的greppage_footer/home/foobar/testfile.txt |切-d'-f2

它会出差错字符串中单引号。计数单​​引号的数量将首先从一个简单的改变这一个过于复杂的解决方案。

用sed的一个解决方案是更好的:除去一切直到最后一个人后的第一个单引号和一切。字符串中的单引号变得混乱时,先关闭sed的参数用单引号,逃避单引号,然后再次打开一个sed字符串:

 的grep page_footer /home/foobar/testfile.txt | SED -e的/ [^'\\''] * //'-e'S / [^'\\''] * $ //'

这是不完整的解决方案,你要删除的第一个/最后报价,以及:

 的grep page_footer /home/foobar/testfile.txt | SED -e的/ [^'\\''] *'\\''//'-e'S /'\\''[^'\\''] * $ //'

写作双引号字符串的SED参数和使用。通配符单引号匹配将使该行更短:

 的grep page_footer /home/foobar/testfile.txt | SED -eS /^[^\\']*.//-ES /.[^\\']*$//

I have a text file with a line that reads:

<div id="page_footer"><div><? print('Any phrase's characters can go here!'); ?></div></div>

And I'm wanting to use sed or awk to extract the substring above between the single quotes so it just prints ...

Any phrase's characters can go here!

I want the phrase to be delimited as I have above, starting after the single quote and ending at the single-quote immediately followed by a parenthesis and then semicolon. The following sed command with a capture group doesn't seem to be working for me. Suggestions?

sed '/^<div id="page_footer"><div><? print(\'\(.\+\)\');/ s//\1/p' /home/foobar/testfile.txt

解决方案

Incorrect would be using cut like

 grep "page_footer" /home/foobar/testfile.txt | cut -d "'" -f2

It will go wrong with single quotes inside the string. Counting the number of single quotes first will change this from a simple to an over-complicated solution.

A solution with sed is better: remove everything until the first single quote and everything after the last one. A single quote in the string becomes messy when you first close the sed parameter with a single quote, escape the single quote and open a sed string again:

grep page_footer /home/foobar/testfile.txt | sed -e 's/[^'\'']*//' -e 's/[^'\'']*$//'

And this is not the full solution, you want to remove the first/last quotes as well:

grep page_footer /home/foobar/testfile.txt | sed -e 's/[^'\'']*'\''//' -e 's/'\''[^'\'']*$//'

Writing the sed parameters in double-quoted strings and using the . wildcard for matching the single quote will make the line shorter:

grep page_footer /home/foobar/testfile.txt | sed -e "s/^[^\']*.//" -e "s/.[^\']*$//"

这篇关于从字符串中提取子串与多个特殊字符使用桑达的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆