如何使用grep在xml标签内查找信息? [英] How to find information inside a xml tag using grep?

查看:656
本文介绍了如何使用grep在xml标签内查找信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个linux shell脚本,使用 grep 在xml文件中查找信息。我在一个mac上,我希望没关系太多。



要找到我需要的信息,我运行:

  grep -oP< title>(。*)< / title> temp.xml 

我得到一个匹配列表,其中包括 < title> 标记。



如何获取只包含标题中信息的列表 tag但没有标题 title tag使用grep?

解决方案

我看不出为什么要使用grep,但可以用一个简单的XPath表达式来解决:

  // title / text()

XPath的工具,他们通常与操作系统捆绑在一起。



解答这个关于Stack Overflow的问题列出了许多这样的工具。
$ b $ < grep 这里的问题是它是一个用于文本处理的通用工具,并且它不知道任何XML结构。对于一个非常简单的场景,您可以开始工作。如果文档比较复杂,或者如果您使用的脚本可以存活数月或数年,而不仅仅是一次性工作,那么您最终可能会对结果感到抱歉。



XPath可以很容易地区分文档中出现在不同上下文中的名称类似的标签。

  <物品> 
< author>
< name> Jon Doe< / name>
< title>主编< / title>
< / author>
< title>关于grep< / title>的好处
< publicationDate> 2018-02-12< / publicationDate>
< text>等等等等等等< / text>
< / article>

使用 grep 将会失败。您可以在技术上编写正则表达式来获得所需内容,但XPath更容易。

()

如果你知道你正在处理一个微不足道的文档,并且格式不会改变或如果这是您可以快速验证结果的一次性工作,您可以按照其他人的说明去 grep


I am working on a linux shell script to find information in a xml file using grep. I am on a mac which I hope doesn't matter too much.

To find the information I need, I run:

grep -oP "<title>(.*)</title>" temp.xml

I get in return a list of matches and this includes the <title> tag.

How can I get a list with only the information inside the title tag but without the title tag using grep?

解决方案

I can't see why you'd want to use grep for this, while it can be solved with a trivial XPath expression:

//title/text()

There are many command line tools for XPath and they're usually bundled with the OS.

Answers to this question on Stack Overflow list a number of such tools.

The problem with grep here is that it's a generic tool for text processing and it's not aware of any XML structure. For a very simple scenario, you can get it working. If the document is complex or if you're using this in a script that will survive months or years and not just a one-off job, you may end up feeling sorry for the results.

XPath makes it easy to tell the difference between similarly named tags that appear in different contexts in a document.

<article>
    <author>
        <name>Jon Doe</name>
        <title>Chief Editor</title>
    </author>
    <title>On the Benefits of grep</title>
    <publicationDate>2018-02-12</publicationDate>
    <text>blah blah blah</text>
</article>

Extracting the title of the article represented by this document with grep would fail if you used any of the other answers posted here. You could technically write the regular expression to get what you need but it's a lot easier with XPath.

/article/title/text()

If you know you're dealing with a trivial document and the format doesn't change or if it's a one time job where you can quickly validate the results, you can go for grep as explained by others.

这篇关于如何使用grep在xml标签内查找信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆