在Bash中,如何解析日志文件中的多个换行符分隔的JSON对象? [英] In Bash, how can I parse multiple newline delimited JSON objects from a log file?
问题描述
2017-01- 26 17:19:40 +0000 docker:{source:stdout,log:我,[2017-01-26T17:19:40.703988#24] INFO - :{\tags\\ :\ structured_log\,\ payload\:{\ results\:[{\ baserate\:\ -1\}]},\\ \\ commit_stamp\:1485451180,\ resource\:\ google_price_result_metric\,\ object_id\:\ 20170126171940700\}, CONTAINER_ID: 6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e ,container_name:/ test-container-b49c8188c3ebe4b93300}
2017-01-26 17:19:40 +0000 docker:{container_id:6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e,container_name:/ test-container信息 - :method = POST路径= / prices.xml格式= xml控制器=我的[2017-01-26T17:19:40.704364#24] TestController action = prices status = 200 du配给= 1686.51查看= 0.08 db = 0.62}
然后使用以下命令提取JSON对象:
... | grep -o -E\ {。* $
我知道我可以用 python -mjson.tool
就像这样:
... | grep -o -E\ {。* $| tail -n1 | python -mjson.tool
但我想解析两行(或n行)。我怎么能在bash中做到这一点?
(我认为xargs应该让我这样做,但我对这个工具很陌生并且无法弄清楚)
jq
可以被告知接受纯文本作为输入,并尝试将提取的子集解析为JSON。请考虑以下示例,并使用jq 1.5进行测试:
jq -R'capture(docker:(?< json> {。* [}])$)|以.json? |选择(。)| fromjson'<<'EOF'
2017-01-26 17:19:40 +0000 docker:{source:stdout,log:我,[2017-01-26T17: 19:40.703988#24] INFO - :{\tags \:\structured_log \,\payload \:{\results \:[{\baserate \ :\ -1\ }]},\ commit_stamp\ :1485451180,\ resource\ :\ google_price_result_metric\,\ object_id\: \20170126171940700\},container_id:6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e,container_name:/ test-container-b49c8188c3ebe4b93300}
2017-01-26 17:19:40 +0000 docker:{ container_id:6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e,container_name:/ test-container-b49c8188c3ebe4b93300,source:stdout,log:我,[2017-01-26T17:19:40.704364#24] INFO - :method = POST path = / prices.xml format = xml controller = TestController action = prices status = 200 duration = 1686.51 view = 0.08 db = 0.62}
EOF
...正确地得出:
{
source:stdout,
log:I ,[2017-01-26T17:19:40.703988#24] INFO - :{\tags \:\structured_log \,\payload \:{\results \ :[{\ baserate\ :\ -1\ }]},\ commit_stamp\ :1485451180,\ resource\ :\ google_price_result_metric\, \object_id \:\20170126171940700 \},
container_id:6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e,
container_name:/ test-container-b49c8188c3ebe4b93300
}
{
container_id:6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e,
container_name:/ test-container-b49c8188c3ebe4b93300,
source:stdout,
log:我,[2017-01-26T17:19:40.704364#24] INFO - :method = POST path = / prices.xml format = xml controller = TestController action = prices status = 200 duration = 1686.51 view = 0.08 db = 0.62
}
I am parsing through a log file and get result lines (using grep) like the following:
2017-01-26 17:19:40 +0000 docker: {"source":"stdout","log":"I, [2017-01-26T17:19:40.703988 #24] INFO -- : {\"tags\":\"structured_log\",\"payload\":{\"results\":[{\"baserate\":\"-1\"}]},\"commit_stamp\":1485451180,\"resource\":\"google_price_result_metric\",\"object_id\":\"20170126171940700\"}","container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300"}
2017-01-26 17:19:40 +0000 docker: {"container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300","source":"stdout","log":"I, [2017-01-26T17:19:40.704364 #24] INFO -- : method=POST path=/prices.xml format=xml controller=TestController action=prices status=200 duration=1686.51 view=0.08 db=0.62"}
I then extract the JSON objects with the following command:
... | grep -o -E "\{.*$"
I know I can parse a single line with python -mjson.tool
like so:
... | grep -o -E "\{.*$" | tail -n1 | python -mjson.tool
But I want to parse both lines (or n lines). How can I do this in bash? (I think xargs is supposed to let me do this, but I am new to the tool and can't figure it out)
jq
can be told to accept plain text as input, and attempt to parse an extracted subset as JSON. Consider the following example, tested with jq 1.5:
jq -R 'capture("docker: (?<json>[{].*[}])$") | .json? | select(.) | fromjson' <<'EOF'
2017-01-26 17:19:40 +0000 docker: {"source":"stdout","log":"I, [2017-01-26T17:19:40.703988 #24] INFO -- : {\"tags\":\"structured_log\",\"payload\":{\"results\":[{\"baserate\":\"-1\"}]},\"commit_stamp\":1485451180,\"resource\":\"google_price_result_metric\",\"object_id\":\"20170126171940700\"}","container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300"}
2017-01-26 17:19:40 +0000 docker: {"container_id":"6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e","container_name":"/test-container-b49c8188c3ebe4b93300","source":"stdout","log":"I, [2017-01-26T17:19:40.704364 #24] INFO -- : method=POST path=/prices.xml format=xml controller=TestController action=prices status=200 duration=1686.51 view=0.08 db=0.62"}
EOF
...properly yields:
{
"source": "stdout",
"log": "I, [2017-01-26T17:19:40.703988 #24] INFO -- : {\"tags\":\"structured_log\",\"payload\":{\"results\":[{\"baserate\":\"-1\"}]},\"commit_stamp\":1485451180,\"resource\":\"google_price_result_metric\",\"object_id\":\"20170126171940700\"}",
"container_id": "6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e",
"container_name": "/test-container-b49c8188c3ebe4b93300"
}
{
"container_id": "6ecbf7f64e4c9557e9dd1efbc6666a3c6c53f9cd5c18414ed5633cad8c302e",
"container_name": "/test-container-b49c8188c3ebe4b93300",
"source": "stdout",
"log": "I, [2017-01-26T17:19:40.704364 #24] INFO -- : method=POST path=/prices.xml format=xml controller=TestController action=prices status=200 duration=1686.51 view=0.08 db=0.62"
}
这篇关于在Bash中,如何解析日志文件中的多个换行符分隔的JSON对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!