在Bash中,如何解析日志文件中的多个换行符分隔的JSON对象? [英] In Bash, how can I parse multiple newline delimited JSON objects from a log file?

查看:151
本文介绍了在Bash中,如何解析日志文件中的多个换行符分隔的JSON对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  2017-01- 26 17:19:40 +0000 docker:{source:stdout,log:我,[2017-01-26T17:19:40.703988#24] INFO  - :{\tags\\ :\ structured_log\,\ payload\:{\ results\:[{\ baserate\:\ -1\}]},\\ \\ commit_stamp\:1485451180,\ resource\:\ google_price_result_metric\,\ object_id\:\ 20170126171940700\}, CONTAINER_ID: 6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e ,container_name:/ test-container-b49c8188c3ebe4b93300} 
2017-01-26 17:19:40 +0000 docker:{container_id:6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e,container_name:/ test-container信息 - :method = POST路径= / prices.xml格式= xml控制器=我的[2017-01-26T17:19:40.704364#24] TestController action = prices status = 200 du配给= 1686.51查看= 0.08 db = 0.62}

然后使用以下命令提取JSON对象:



... | grep -o -E\ {。* $



我知道我可以用 python -mjson.tool 就像这样:



... | grep -o -E\ {。* $| tail -n1 | python -mjson.tool



但我想解析两行(或n行)。我怎么能在bash中做到这一点?
(我认为xargs应该让我这样做,但我对这个工具很陌生并且无法弄清楚)

解决方案

jq 可以被告知接受纯文本作为输入,并尝试将提取的子集解析为JSON。请考虑以下示例,并使用jq 1.5进行测试:

  jq -R'capture(docker:(?< json> {。* [}])$)|以.json? |选择(。)| fromjson'<<'EOF'
2017-01-26 17:19:40 +0000 docker:{source:stdout,log:我,[2017-01-26T17: 19:40.703988#24] INFO - :{\tags \:\structured_log \,\payload \:{\results \:[{\baserate \ :\ -1\ }]},\ commit_stamp\ :1485451180,\ resource\ :\ google_price_result_metric\,\ object_id\: \20170126171940700\},container_id:6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e,container_name:/ test-container-b49c8188c3ebe4b93300}
2017-01-26 17:19:40 +0000 docker:{ container_id:6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e,container_name:/ test-container-b49c8188c3ebe4b93300,source:stdout,log:我,[2017-01-26T17:19:40.704364#24] INFO - :method = POST path = / prices.xml format = xml controller = TestController action = prices status = 200 duration = 1686.51 view = 0.08 db = 0.62}
EOF

...正确地得出:

  {
source:stdout,
log:I ,[2017-01-26T17:19:40.703988#24] INFO - :{\tags \:\structured_log \,\payload \:{\results \ :[{\ baserate\ :\ -1\ }]},\ commit_stamp\ :1485451180,\ resource\ :\ google_price_result_metric\, \object_id \:\20170126171940700 \},
container_id:6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e,
container_name:/ test-container-b49c8188c3ebe4b93300
}
{
container_id:6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e,
container_name:/ test-container-b49c8188c3ebe4b93300,
source:stdout,
log:我,[2017-01-26T17:19:40.704364#24] INFO - :method = POST path = / prices.xml format = xml controller = TestController action = prices status = 200 duration = 1686.51 view = 0.08 db = 0.62
}


I am parsing through a log file and get result lines (using grep) like the following:

2017-01-26 17:19:40 +0000 docker: {"source":"stdout","log":"I, [2017-01-26T17:19:40.703988 #24]  INFO -- : {\"tags\":\"structured_log\",\"payload\":{\"results\":[{\"baserate\":\"-1\"}]},\"commit_stamp\":1485451180,\"resource\":\"google_price_result_metric\",\"object_id\":\"20170126171940700\"}","container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300"}
2017-01-26 17:19:40 +0000 docker: {"container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300","source":"stdout","log":"I, [2017-01-26T17:19:40.704364 #24]  INFO -- : method=POST path=/prices.xml format=xml controller=TestController action=prices status=200 duration=1686.51 view=0.08 db=0.62"}

I then extract the JSON objects with the following command:

... | grep -o -E "\{.*$"

I know I can parse a single line with python -mjson.tool like so:

... | grep -o -E "\{.*$" | tail -n1 | python -mjson.tool

But I want to parse both lines (or n lines). How can I do this in bash? (I think xargs is supposed to let me do this, but I am new to the tool and can't figure it out)

解决方案

jq can be told to accept plain text as input, and attempt to parse an extracted subset as JSON. Consider the following example, tested with jq 1.5:

jq -R 'capture("docker: (?<json>[{].*[}])$") | .json? | select(.) | fromjson' <<'EOF'
2017-01-26 17:19:40 +0000 docker: {"source":"stdout","log":"I, [2017-01-26T17:19:40.703988 #24]  INFO -- : {\"tags\":\"structured_log\",\"payload\":{\"results\":[{\"baserate\":\"-1\"}]},\"commit_stamp\":1485451180,\"resource\":\"google_price_result_metric\",\"object_id\":\"20170126171940700\"}","container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300"}
2017-01-26 17:19:40 +0000 docker: {"container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300","source":"stdout","log":"I, [2017-01-26T17:19:40.704364 #24]  INFO -- : method=POST path=/prices.xml format=xml controller=TestController action=prices status=200 duration=1686.51 view=0.08 db=0.62"}
EOF

...properly yields:

{
  "source": "stdout",
  "log": "I, [2017-01-26T17:19:40.703988 #24]  INFO -- : {\"tags\":\"structured_log\",\"payload\":{\"results\":[{\"baserate\":\"-1\"}]},\"commit_stamp\":1485451180,\"resource\":\"google_price_result_metric\",\"object_id\":\"20170126171940700\"}",
  "container_id": "6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e",
  "container_name": "/test-container-b49c8188c3ebe4b93300"
}
{
  "container_id": "6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e",
  "container_name": "/test-container-b49c8188c3ebe4b93300",
  "source": "stdout",
  "log": "I, [2017-01-26T17:19:40.704364 #24]  INFO -- : method=POST path=/prices.xml format=xml controller=TestController action=prices status=200 duration=1686.51 view=0.08 db=0.62"
}

这篇关于在Bash中,如何解析日志文件中的多个换行符分隔的JSON对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆