如何使用grep在xml标签中查找信息? [英] How to find information inside a xml tag using grep?

查看：27 发布时间：2022/1/6 13:51:51 xml regex shell grep

本文介绍了如何使用grep在xml标签中查找信息?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写 bash 脚本以从 xml 文件中提取一些信息.我为此使用了 grep.

I am working on a bash script to extract some information from a xml file. I'm using grep for this.

为了找到我需要的信息，我运行:

To find the information I need, I run:

grep -oP "<title>(.*)</title>" temp.xml

我得到一个匹配列表，其中包括 </code> 标签.<em class="showen"></em></p> <p class="en">I get a list of matches and this includes the <code><title></code> tag.</p> <p class="cn">如何使用 grep 获取仅包含 <code>title</code> 标签内的文本但<strong>没有</strong><code>title</code> 标签的列表?<em class="showen"></em></p> <p class="en">How can I get a list containing only the text inside the <code>title</code> tag but <strong>without</strong> the <code>title</code> tag using grep?</p> <h3 class="best_ans mt-1">推荐答案</h3> <p class="cn">我不明白你为什么要为此使用 grep，而它可以用一个简单的 XPath 表达式解决:<em class="showen"></em></p> <p class="en">I can't see why you'd want to use grep for this, while it can be solved with a trivial XPath expression:</p> <pre><code><code>//title/text() </code></code></pre> <p class="cn">有许多用于 XPath 的命令行工具，它们通常与操作系统捆绑在一起.<em class="showen"></em></p> <p class="en">There are many command line tools for XPath and they're usually bundled with the OS.</p> <p class="cn">对<a href="https://stackoverflow.com/q/15461737/1407656">有关 Stack Overflow 的这个问题的回答</a>列出了许多此类工具.<em class="showen"></em></p> <p class="en">Answers to this question on Stack Overflow list a number of such tools.</p> <p class="cn">这里<code>grep</code> 的问题在于它是一个用于文本处理的通用工具，它不知道任何XML 结构.对于一个非常简单的场景，您可以<a href="https://stackoverflow.com/a/10783395/1407656">让它工作</a>.如果文档很复杂，或者如果您在一个可以持续数月或数年而不仅仅是一次性工作的脚本中使用它，您最终可能会对结果感到遗憾.<em class="showen"></em></p> <p class="en">The problem with <code>grep</code> here is that it's a generic tool for text processing and it's not aware of any XML structure. For a very simple scenario, you can get it working. If the document is complex or if you're using this in a script that will survive months or years and not just a one-off job, you may end up feeling sorry for the results.</p> <p class="cn">XPath 可以轻松区分出现在文档中不同上下文中的类似名称的标签.<em class="showen"></em></p> <p class="en">XPath makes it easy to tell the difference between similarly named tags that appear in different contexts in a document.</p> <pre><code><code><article> <author> <name>Jon Doe</name> <title>Chief Editor</title> </author> <title>On the Benefits of grep</title> <publicationDate>2018-02-12</publicationDate> <text>blah blah blah</text> </article> </code></code></pre> <p class="cn">如果您使用此处发布的任何其他答案，则使用 <code>grep</code> 提取此文档表示的文章的标题将失败.从技术上讲，您可以编写正则表达式来获得所需的内容，但使用 XPath 会容易得多.<em class="showen"></em></p> <p class="en">Extracting the title of the article represented by this document with <code>grep</code> would fail if you used any of the other answers posted here. You could technically write the regular expression to get what you need but it's a lot easier with XPath.</p> <pre><code><code>/article/title/text() </code></code></pre> <p class="cn">如果你知道你正在处理一个琐碎的文档并且格式没有改变，或者如果它是一次性的工作，你可以快速验证结果，你可以按照解释去使用 <code>grep</code>别人的.<em class="showen"></em></p> <p class="en">If you know you're dealing with a trivial document and the format doesn't change or if it's a one time job where you can quickly validate the results, you can go for <code>grep</code> as explained by others.</p> <p>这篇关于如何使用grep在xml标签中查找信息?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！</p> </div> <div class="arc-body-main-more"> <span onclick="unlockarc('2776373');">查看全文</span> </div> </div> <div> </div> <div class="wwads-cn wwads-horizontal" data-id="166" style="max-width:100%;border: 4px solid #666;"></div> </div> </article> <div id="arc-ad-2" class="mb-1"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-5038752844014834" crossorigin="anonymous"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-5038752844014834" data-ad-slot="3921941283"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="widget bgwhite radius-1 mb-1 shadow widget-rel"> <h5>相关文章</h5> <ul> <li> <a target="_blank" title="如何使用grep在xml标签内查找信息？" href="/835303.html"> 如何使用grep在xml标签内查找信息？; </a> </li> <li> <a target="_blank" title="使用 grep 获取 xml 标签内的文本" href="/2776371.html"> 使用 grep 获取 xml 标签内的文本; </a> </li> <li> <a target="_blank" title="获取使用grep XML标签内的文本" href="/322498.html"> 获取使用grep XML标签内的文本; </a> </li> <li> <a target="_blank" title="如何使用Java在XML中查找未闭合的标签？" href="/971019.html"> 如何使用Java在XML中查找未闭合的标签？; </a> </li> <li> <a target="_blank" title="如何使用 grep 在文件夹中查找单词?" href="/2728281.html"> 如何使用 grep 在文件夹中查找单词?; </a> </li> <li> <a target="_blank" title="如何使用LXML递归地查找xml标签？" href="/757424.html"> 如何使用LXML递归地查找xml标签？; </a> </li> <li> <a target="_blank" title="如何使用grep来查找'../images/“" href="/326660.html"> 如何使用grep来查找'../images/“; </a> </li> <li> <a target="_blank" title="如何使用LIKE在xml标记之间查找一些信息" href="/1415259.html"> 如何使用LIKE在xml标记之间查找一些信息; </a> </li> <li> <a target="_blank" title="我如何使用grep在文件夹中查找单词？" href="/835181.html"> 我如何使用grep在文件夹中查找单词？; </a> </li> <li> <a target="_blank" title="Xml - 使用 Python 按标签查找元素" href="/2502078.html"> Xml - 使用 Python 按标签查找元素; </a> </li> <li> <a target="_blank" title="如何使用grep跨多行查找模式？" href="/835140.html"> 如何使用grep跨多行查找模式？; </a> </li> <li> <a target="_blank" title="RegEx查找所有XML标签" href="/1820562.html"> RegEx查找所有XML标签; </a> </li> <li> <a target="_blank" title="如何使用 grep()/gsub() 查找完全匹配" href="/2359788.html"> 如何使用 grep()/gsub() 查找完全匹配; </a> </li> <li> <a target="_blank" title="如何使用grep（）来查找完全匹配" href="/835156.html"> 如何使用grep（）来查找完全匹配; </a> </li> <li> <a target="_blank" title="使用grep在多个文件中查找字符串" href="/835416.html"> 使用grep在多个文件中查找字符串; </a> </li> <li> <a target="_blank" title="如何使用ElementTree在具有名称空间的XML文件中查找和编辑标签" href="/2242409.html"> 如何使用ElementTree在具有名称空间的XML文件中查找和编辑标签; </a> </li> <li> <a target="_blank" title="在python中从标签名称中查找xml文本内容" href="/2500980.html"> 在python中从标签名称中查找xml文本内容; </a> </li> <li> <a target="_blank" title="搜索标签，不使用-P，使用'grep'" href="/835305.html"> 搜索标签，不使用-P，使用'grep'; </a> </li> <li> <a target="_blank" title="使用 grep 在文件中查找内容并在匹配时移动它们" href="/2764666.html"> 使用 grep 在文件中查找内容并在匹配时移动它们; </a> </li> <li> <a target="_blank" title="如何使用C＃在XML中查找重复节点" href="/1106007.html"> 如何使用C＃在XML中查找重复节点; </a> </li> <li> <a target="_blank" title="使用 grep 查找所有电子邮件" href="/2776314.html"> 使用 grep 查找所有电子邮件; </a> </li> <li> <a target="_blank" title="使用 grep 查找所有匹配的模式" href="/2776551.html"> 使用 grep 查找所有匹配的模式; </a> </li> <li> <a target="_blank" title="使用grep查找所有电子邮件" href="/835405.html"> 使用grep查找所有电子邮件; </a> </li> <li> <a target="_blank" title="关于XML布局标签属性Android SDK中的信息" href="/140325.html"> 关于XML布局标签属性Android SDK中的信息; </a> </li> <li> <a target="_blank" title="使用ElementTree在XML树中查找元素" href="/2062163.html"> 使用ElementTree在XML树中查找元素; </a> </li> </ul> </div> <div class="mb-1"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-5038752844014834" crossorigin="anonymous"></script> <ins class="adsbygoogle" style="display:block" data-ad-format="autorelaxed" data-ad-client="ca-pub-5038752844014834" data-ad-slot="3921941283"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> <div class="side"> <div class="widget widget-side bgwhite mb-1 shadow"> <h5>其他开发最新文章</h5> <ul> <li> <a target="_blank" title="拒绝显示一个框架，因为它将'X-Frame-Options'设置为'sameorigin'" href="/893060.html"> 拒绝显示一个框架，因为它将'X-Frame-Options'设置为'sameorigin'; </a> </li> <li> <a target="_blank" title="什么是＆QUOT; AW＆QUOT;在部分标志属性是什么意思？" href="/303988.html"> 什么是＆QUOT; AW＆QUOT;在部分标志属性是什么意思？; </a> </li> <li> <a target="_blank" title="在运行npm install命令时获取'npm WARN弃用'警告" href="/840917.html"> 在运行npm install命令时获取'npm WARN弃用'警告; </a> </li> <li> <a target="_blank" title="cmake无法找到openssl" href="/516280.html"> cmake无法找到openssl; </a> </li> <li> <a target="_blank" title="从Spark的scala中的* .tar.gz压缩文件中读取HDF5文件" href="/850628.html"> 从Spark的scala中的* .tar.gz压缩文件中读取HDF5文件; </a> </li> <li> <a target="_blank" title="Twitter :: Error :: Forbidden - 无法验证您的凭据" href="/630061.html"> Twitter :: Error :: Forbidden - 无法验证您的凭据; </a> </li> <li> <a target="_blank" title="我什么时候需要一个fb：app_id或者fb：admins？" href="/747981.html"> 我什么时候需要一个fb：app_id或者fb：admins？; </a> </li> <li> <a target="_blank" title="将.db文件导入R" href="/902960.html"> 将.db文件导入R; </a> </li> <li> <a target="_blank" title="npm通知创建一个lockfile作为package-lock.json。你应该提交这个文件" href="/744854.html"> npm通知创建一个lockfile作为package-lock.json。你应该提交这个文件; </a> </li> <li> <a target="_blank" title="拒绝执行内联脚本，因为它违反了以下内容安全策略指令：“script-src'self'”" href="/819167.html"> 拒绝执行内联脚本，因为它违反了以下内容安全策略指令：“script-src'self'”; </a> </li> </ul> </div> <div class="widget widget-side bgwhite mb-1 shadow"> <h5> 热门教程 </h5> <ul> <li> <a target="_blank" title="Java教程" href="/OnLineTutorial/java/index.html"> Java教程 </a> </li> <li> <a target="_blank" title="Apache ANT 教程" href="/OnLineTutorial/ant/index.html"> Apache ANT 教程 </a> </li> <li> <a target="_blank" title="Kali Linux教程" href="/OnLineTutorial/kali_linux/index.html"> Kali Linux教程 </a> </li> <li> <a target="_blank" title="JavaScript教程" href="/OnLineTutorial/javascript/index.html"> JavaScript教程 </a> </li> <li> <a target="_blank" title="JavaFx教程" href="/OnLineTutorial/javafx/index.html"> JavaFx教程 </a> </li> <li> <a target="_blank" title="MFC 教程" href="/OnLineTutorial/mfc/index.html"> MFC 教程 </a> </li> <li> <a target="_blank" title="Apache HTTP客户端教程" href="/OnLineTutorial/apache_httpclient/index.html"> Apache HTTP客户端教程 </a> </li> <li> <a target="_blank" title="Microsoft Visio 教程" href="/OnLineTutorial/microsoft_visio/index.html"> Microsoft Visio 教程 </a> </li> </ul> </div> <div class="widget widget-side bgwhite mb-1 shadow"> <h5> 热门工具 </h5> <ul> <li> <a target="_blank" title="Java 在线工具" href="/Onlinetools/details/4"> Java 在线工具 </a> </li> <li> <a target="_blank" title="C(GCC) 在线工具" href="/Onlinetools/details/6"> C(GCC) 在线工具 </a> </li> <li> <a target="_blank" title="PHP 在线工具" href="/Onlinetools/details/8"> PHP 在线工具 </a> </li> <li> <a target="_blank" title="C# 在线工具" href="/Onlinetools/details/1"> C# 在线工具 </a> </li> <li> <a target="_blank" title="Python 在线工具" href="/Onlinetools/details/5"> Python 在线工具 </a> </li> <li> <a target="_blank" title="MySQL 在线工具" href="/Onlinetools/Dbdetails/33"> MySQL 在线工具 </a> </li> <li> <a target="_blank" title="VB.NET 在线工具" href="/Onlinetools/details/2"> VB.NET 在线工具 </a> </li> <li> <a target="_blank" title="Lua 在线工具" href="/Onlinetools/details/14"> Lua 在线工具 </a> </li> <li> <a target="_blank" title="Oracle 在线工具" href="/Onlinetools/Dbdetails/35"> Oracle 在线工具 </a> </li> <li> <a target="_blank" title="C++(GCC) 在线工具" href="/Onlinetools/details/7"> C++(GCC) 在线工具 </a> </li> <li> <a target="_blank" title="Go 在线工具" href="/Onlinetools/details/20"> Go 在线工具 </a> </li> <li> <a target="_blank" title="Fortran 在线工具" href="/Onlinetools/details/45"> Fortran 在线工具 </a> </li> </ul> </div> </div> </div> <script type="text/javascript">var eskeys = '如何,使用,grep,在,xml,标签,中,查找,信息'; var cat = 'cc';';//other-dev</script> </div> <div id="pop" onclick="pophide();"> <div id="pop_body" onclick="event.stopPropagation();"> <h6 class="flex flex101"> 登录 <span onclick="pophide();">关闭</span> </h6> <div class="pd-1"> <div class="wxtip center"> <span>扫码关注<em>1秒</em>登录</span> </div> <div class="center"> <img id="qr" src="https://huajiakeji.com/Content/Images/qrydx.jpg" alt="" style="width:150px;height:150px;" /> </div> <div style="margin-top:10px;display:flex;justify-content: center;"> <input type="text" placeholder="输入验证码" id="txtcode" autocomplete="off" /> <input id="btngo" type="button" onclick="chk()" value="GO" /> </div> <div class="center" style="margin: 4px; font-size: .8rem; color: #f60;"> 发送“验证码”获取 <em style="padding: 0 .5rem;">|</em> <span style="color: #01a05c;">15天全站免登陆</span> </div> <div id="chkinfo" class="tip"></div> </div> </div> </div> <script type="text/javascript" src="https://lib.sinaapp.com/js/jquery/1.9.1/jquery-1.9.1.min.js"></script> <script type="text/javascript" src="https://cdn.bootcss.com/jquery-cookie/1.4.1/jquery.cookie.min.js"></script> <script type="text/javascript" src="https://img01.yuandaxia.cn/Scripts/highlight.min.js"></script> <script type="text/javascript" src="https://img01.yuandaxia.cn/Scripts/base.js?v=0.22"></script> <script type="text/javascript" src="https://img01.yuandaxia.cn/Scripts/tui.js?v=0.11"></script> <footer class="footer"> <div class="container"> <div class="flink mb-1"> 友情链接： <a href="https://www.it1352.com/" target="_blank">IT屋</a> <a href="https://huajiakeji.com/" target="_blank">Chrome插件</a> <a href="https://www.cnplugins.com/" target="_blank">谷歌浏览器插件</a> </div> <section class="copyright-section"> <a href="https://www.it1352.com" title="IT屋-程序员软件开发技术分享社区">IT屋</a> ©2016-2022 <a href="http://www.beian.miit.gov.cn/" target="_blank">琼ICP备2021000895号-1</a> <a href="/sitemap.html" target="_blank" title="站点地图">站点地图</a> <a href="/Home/Tags" target="_blank" title="站点标签">站点标签</a> <a target="_blank" alt="sitemap" href="/sitemap.xml">SiteMap</a> <a href="/1155981.html" title="IT屋-免责申明"><免责申明></a> 本站内容来源互联网,如果侵犯您的权益请联系我们删除. </section>  <script type="text/javascript"> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?0c3a090f7b3c4ad458ac1296cb5cc779"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> <script type="text/javascript"> (function () { var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script> </div> </footer> </body> </html>