如何使用sed提取分隔符之间的字符? [英] How to extract characters between the delimiters using sed?
问题描述
我刚刚开始学习 sed.我想提取并打印 > 和 < 之间的字符;分隔符.这是我的数据文件中的文本:
I have just started learning sed. I want to extract and print the characters between the > and < delimiters. Here the text in my data file:
<span id="ctl00_ContentPlaceHolder1_lblRollNo">12029</span>
<br /><b>Engineering & IT/Computer Science</b><br />
<div id="ctl00_ContentPlaceHolder1_divEngITMerit">
<span id="ctl00_ContentPlaceHolder1_lblEngITSelListNo">3rd Provisional Selection List</span>
<tr><td style='width: 200px' class='TblTRData'>IT/Computer Science/Software</td><td style='width: 150px'class='TblTRData'>7 (out of 471)</td><td style='width: 325px'class='TblTRData'>Selected in MS COMPUTER SCIENCE</td></tr>
Name:
<span id="ctl00_ContentPlaceHolder1_lblName">SIDRA SHAHID</span>
Father Name:
<span id="ctl00_ContentPlaceHolder1_lblFatherName">SHAHID RAFEEQ AHMAD</span>
我写了命令:
sed -n -e '/^[^>]*>([^<]*)<.*/s//1/p' myfile.txt
问题在于它返回了一些 > < 之间的文本.例如,它打印 12029,但不打印 Selected in MS COMPUTER SCIENCE.我做错了什么?
The problem is that it is returning the text between some of the > <. For example, it prints 12029, but not Selected in Selected in MS COMPUTER SCIENCE. What am I doing wrong?
推荐答案
如果您只需要提取标签之间的字符串,这意味着您需要删除标签,使它们之间的字符串保持不变.对吗?
If you need to extract only strings between tags, this means you need to delete tags leaving strings between them untouched. Right?
sed 's/<[^>]*>//g'
它用空字符串(没有)替换(所有出现的)标签( "<" 下一个 ">" 上的所有内容).文字将保留.
It substitutes (all occurrences) of tag ( "<" everything upon next ">" ) with empty string (nothing). Text will remain.
这篇关于如何使用sed提取分隔符之间的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!