使用 Unix 工具解析 JSON [英] Parsing JSON with Unix tools

查看:31
本文介绍了使用 Unix 工具解析 JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析从 curl 请求返回的 JSON,如下所示:

I'm trying to parse JSON returned from a curl request, like so:

curl 'http://twitter.com/users/username.json' |
    sed -e 's/[{}]/''/g' | 
    awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'

以上将JSON拆分为字段,例如:

The above splits the JSON into fields, for example:

% ...
"geo_enabled":false
"friends_count":245
"profile_text_color":"000000"
"status":"in_reply_to_screen_name":null
"source":"web"
"truncated":false
"text":"My status"
"favorited":false
% ...

如何打印特定字段(由 -v k=text 表示)?

How do I print a specific field (denoted by the -v k=text)?

推荐答案

有许多专门设计用于从命令行操作 JSON 的工具,它们比使用 awk 更容易和更可靠,比如jq:

There are a number of tools specifically designed for the purpose of manipulating JSON from the command line, and will be a lot easier and more reliable than doing it with Awk, such as jq:

curl -s 'https://api.github.com/users/lambda' | jq -r '.name'

您也可以使用系统上可能已安装的工具执行此操作,例如使用 json 模块,因此避免了任何额外的依赖,同时仍然可以使用适当的 JSON 解析器.下面假设您要使用 UTF-8,原始 JSON 应该用它编码,并且也是大多数现代终端使用的:

You can also do this with tools that are likely already installed on your system, like Python using the json module, and so avoid any extra dependencies, while still having the benefit of a proper JSON parser. The following assume you want to use UTF-8, which the original JSON should be encoded in and is what most modern terminals use as well:

Python 3:

curl -s 'https://api.github.com/users/lambda' | 
    python3 -c "import sys, json; print(json.load(sys.stdin)['name'])"

Python 2:

export PYTHONIOENCODING=utf8
curl -s 'https://api.github.com/users/lambda' | 
    python2 -c "import sys, json; print json.load(sys.stdin)['name']"

常见问题

为什么不是纯 shell 解决方案?

标准POSIX/Single Unix Specification shell 是一个非常有限的不包含用于表示序列(列表或数组)或关联数组(在某些其他语言中也称为哈希表、映射、字典或对象)的工具的语言.这使得在便携式 shell 脚本中表示解析 JSON 的结果有些棘手.有有些骇人听闻的方法,但是如果键或值包含某些特殊字符.

Frequently Asked Questions

Why not a pure shell solution?

The standard POSIX/Single Unix Specification shell is a very limited language which doesn't contain facilities for representing sequences (list or arrays) or associative arrays (also known as hash tables, maps, dicts, or objects in some other languages). This makes representing the result of parsing JSON somewhat tricky in portable shell scripts. There are somewhat hacky ways to do it, but many of them can break if keys or values contain certain special characters.

Bash 4 及更高版本、zsh 和 ksh 支持数组和关联数组,但这些 shell 并非普遍可用(由于从 GPLv2 更改为 GPLv3,macOS 在 Bash 3 停止更新 Bash,而许多 Linux 系统不支持'没有开箱即用的安装 zsh).有可能您可以编写一个可以在 Bash 4 或 zsh 中运行的脚本,其中一个在当今大多数 macOS、Linux 和 BSD 系统上都可用,但是很难编写一个适用于这样的 shebang 行多语言脚本.

Bash 4 and later, zsh, and ksh have support for arrays and associative arrays, but these shells are not universally available (macOS stopped updating Bash at Bash 3, due to a change from GPLv2 to GPLv3, while many Linux systems don't have zsh installed out of the box). It's possible that you could write a script that would work in either Bash 4 or zsh, one of which is available on most macOS, Linux, and BSD systems these days, but it would be tough to write a shebang line that worked for such a polyglot script.

最后,在 shell 中编写一个完整的 JSON 解析器将是一个足够重要的依赖项,您可能只使用现有的依赖项,如 jq 或 Python.要想实现一个好的实现,它不会是一个单行的,甚至不是一个五行的小片段.

Finally, writing a full fledged JSON parser in shell would be a significant enough enough dependency that you might as well just use an existing dependency like jq or Python instead. It's not going to be a one-liner, or even small five-line snippet, to do a good implementation.

可以使用这些工具从具有已知形状并以已知方式格式化的 JSON 中进行一些快速提取,例如每行一个键.其他答案中有几个对此的建议示例.

It is possible to use these tools to do some quick extraction from JSON with a known shape and formatted in a known way, such as one key per line. There are several examples of suggestions for this in other answers.

然而,这些工具是为基于行或基于记录的格式而设计的;它们不是为递归解析带有可能转义字符的匹配分隔符而设计的.

However, these tools are designed for line based or record based formats; they are not designed for recursive parsing of matched delimiters with possible escape characters.

因此,这些使用 awk/sed/grep 的快速而肮脏的解决方案可能很脆弱,如果输入格式的某些方面发生变化,例如折叠空格,或向 JSON 对象添加额外的嵌套级别,或字符串中的转义引号.一个足够强大以处理所有 JSON 输入而不中断的解决方案也将相当庞大和复杂,因此与添加对 jq 或 Python 的另一个依赖关系没有太大区别.

So these quick and dirty solutions using awk/sed/grep are likely to be fragile, and break if some aspect of the input format changes, such as collapsing whitespace, or adding additional levels of nesting to the JSON objects, or an escaped quote within a string. A solution that is robust enough to handle all JSON input without breaking will also be fairly large and complex, and so not too much different than adding another dependency on jq or Python.

我之前不得不处理由于在 shell 脚本中输入解析不佳而导致大量客户数据被删除的情况,所以我从不推荐快速和肮脏的方法,因为这种方法可能很脆弱.如果您正在进行一些一次性处理,请参阅其他答案以获得建议,但我仍然强烈建议您只使用现有的经过测试的 JSON 解析器.

I have had to deal with large amounts of customer data being deleted due to poor input parsing in a shell script before, so I never recommend quick and dirty methods that may be fragile in this way. If you're doing some one-off processing, see the other answers for suggestions, but I still highly recommend just using an existing tested JSON parser.

这个答案最初推荐 jsawk,它应该仍然有效,但使用起来比jq,并且依赖于安装的独立 JavaScript 解释器,该解释器比 Python 解释器少见,因此上述答案可能更可取:

This answer originally recommended jsawk, which should still work, but is a little more cumbersome to use than jq, and depends on a standalone JavaScript interpreter being installed which is less common than a Python interpreter, so the above answers are probably preferable:

curl -s 'https://api.github.com/users/lambda' | jsawk -a 'return this.name'

这个答案最初也使用了问题中的 Twitter API,但该 API 不再有效,因此很难复制示例进行测试,而且新的 Twitter API 需要 API 密钥,所以我改用了无需 API 密钥即可轻松使用的 GitHub API.原始问题的第一个答案是:

This answer also originally used the Twitter API from the question, but that API no longer works, making it hard to copy the examples to test out, and the new Twitter API requires API keys, so I've switched to using the GitHub API which can be used easily without API keys. The first answer for the original question would be:

curl 'http://twitter.com/users/username.json' | jq -r '.text'

这篇关于使用 Unix 工具解析 JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆