带有多个管道的json文件的正则表达式 [英] regex for json file with multiple piping

查看:85
本文介绍了带有多个管道的json文件的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下命令可在unix中获取json:

I have the following command to grab a json in unix:

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json

这给了我以下输出格式(每次显然都有不同的结果):

Which gives me the following output format (with different results each time obviously):

{
 "kind": "...",
 "data": {
 "modhash": "",
 "whitelist_status": "...",
 "children": [
 e1,
 e2,
 e3,
 ...
 ],
 "after": "...",
 "before": "..."
 }
}

其中数组子元素的每个元素都是一个结构如下的对象:

where each element of the array children is an object structured as follows:

{
 "kind": "...",
 "data": {
 ...
 }
}

以下是完整.json get的示例(主体太长,无法直接发布: https://pastebin.com/20p4kk3u

Here is an example of a complete .json get (body is too long to post directly: https://pastebin.com/20p4kk3u

我需要打印完整的数据对象,如数组子元素的每个元素中所示.我知道我需要至少两次管道传输,以便最初获得孩子[...],然后从那里获取数据{...},这就是我到目前为止所拥有的:

I need to print the complete data object as present inside each element of the array children. I know I need pipe atleast twice, to initially get children [...], then data {...} from there on, and this is what I have so far:

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'

我是正则表达式的新手,所以我不确定如何处理要删除的元素中的方括号或花括号.上面的行没有向外壳显示任何内容,我不确定为什么.感谢您的帮助.

I'm new to regular expressions, so I'm not sure how to handle having brackets or curly braces within elements of what I'm grepping. The line above prints nothing to the shell and I'm not sure why. Any help is appreciated.

推荐答案

代码

wget -q -O- https://www.reddit.com/r/NetflixBestOf/.json | tr -d '\r\n' | grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])' | grep -oP '"data"\s*:\s*\K({.+?})(?=\s*},)'

有关正则表达式的内容

* == zero or more time
+ == one or more time
? == zero or one time
\s == a space character or a tab character or a carriage return character or a new line character or a vertical tab character or a form feed character
\w == is a word character and can to be from A to Z (upper or lower), from 0 to 9, included also underscore (_)
\d == all numbers from 0 to 9
\r == carriage return
\n == new line character (line feed)
\ == escape special characters so they can to be read as normal characters
[...] == search for character class. Example: [abc] search for a or b or c
(?=) == is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured.
\K == match start at this position.

无论如何,您都可以从此处阅读有关正则表达式的更多信息: Regex教程

现在我可以尝试解释代码了

wget download the source.
tr remove all line feed e carriage return, so we have all the output in one line and can to be handle from grep.
grep -o option is used for only matching.
grep -P option is for perl regexp.

So here
grep -oP '"children"\s*:\s*\[\s*\K({.+?})(?=\s*\])'
we have sayed:
match all the line from "children"
zero or more spaces
:
zero or more spaces
\[ escaped so it's a simple character and not a special
zero or more spaces
\K force submatch to start from here
( submatch
{.+?} all, in braces (the braces are included because after start submatch sign. See greedy, not greedy in the regex tutorial for understand how work .+?)
) close submatch
(?=\s*\]) stop submatch when zero or more space founded and simple ] is founded but not include it in the submatch.

这篇关于带有多个管道的json文件的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆