如何使用grep在xml标签中查找信息? [英] How to find information inside a xml tag using grep?
问题描述
我正在编写 bash 脚本以从 xml 文件中提取一些信息.我为此使用了 grep
.
I am working on a bash script to extract some information from a xml file. I'm using grep
for this.
为了找到我需要的信息,我运行:
To find the information I need, I run:
grep -oP "<title>(.*)</title>" temp.xml
我得到一个匹配列表,其中包括
标签.
I get a list of matches and this includes the <title>
tag.
如何使用 grep 获取仅包含 title
标签内的文本但没有title
标签的列表?
How can I get a list containing only the text inside the title
tag but without the title
tag using grep?
推荐答案
我不明白你为什么要为此使用 grep,而它可以用一个简单的 XPath 表达式解决:
I can't see why you'd want to use grep for this, while it can be solved with a trivial XPath expression:
//title/text()
有许多用于 XPath 的命令行工具,它们通常与操作系统捆绑在一起.
There are many command line tools for XPath and they're usually bundled with the OS.
对有关 Stack Overflow 的这个问题的回答列出了许多此类工具.
Answers to this question on Stack Overflow list a number of such tools.
这里grep
的问题在于它是一个用于文本处理的通用工具,它不知道任何XML 结构.对于一个非常简单的场景,您可以让它工作.如果文档很复杂,或者如果您在一个可以持续数月或数年而不仅仅是一次性工作的脚本中使用它,您最终可能会对结果感到遗憾.
The problem with grep
here is that it's a generic tool for text processing and it's not aware of any XML structure. For a very simple scenario, you can get it working. If the document is complex or if you're using this in a script that will survive months or years and not just a one-off job, you may end up feeling sorry for the results.
XPath 可以轻松区分出现在文档中不同上下文中的类似名称的标签.
XPath makes it easy to tell the difference between similarly named tags that appear in different contexts in a document.
<article>
<author>
<name>Jon Doe</name>
<title>Chief Editor</title>
</author>
<title>On the Benefits of grep</title>
<publicationDate>2018-02-12</publicationDate>
<text>blah blah blah</text>
</article>
如果您使用此处发布的任何其他答案,则使用 grep
提取此文档表示的文章的标题将失败.从技术上讲,您可以编写正则表达式来获得所需的内容,但使用 XPath 会容易得多.
Extracting the title of the article represented by this document with grep
would fail if you used any of the other answers posted here. You could technically write the regular expression to get what you need but it's a lot easier with XPath.
/article/title/text()
如果你知道你正在处理一个琐碎的文档并且格式没有改变,或者如果它是一次性的工作,你可以快速验证结果,你可以按照解释去使用 grep
别人的.
If you know you're dealing with a trivial document and the format doesn't change or if it's a one time job where you can quickly validate the results, you can go for grep
as explained by others.
这篇关于如何使用grep在xml标签中查找信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!