jq中的SQL风格GROUP BY聚合函数（COUNT，SUM等） [英] SQL-style GROUP BY aggregate functions in jq (COUNT, SUM and etc)

查看：135 发布时间：2018/5/30 14:17:24 sql json group-by aggregate-functions jq

本文介绍了jq中的SQL风格GROUP BY聚合函数（COUNT，SUM等）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

前面提到过类似的问题：

为单个键计数项目：

计算对象值的总和：

问题

如何模拟COUNT聚合函数，该函数的行为应与其SQL原始行为类似？让我们继续扩展这个问题以包含其他常规SQL函数：

COUNT

SUM / MAX / MIN / AVG

ARRAY_AGG

最后一个不是标准的SQL函数 - 它是来自PostgreSQL，但是非常有用。

在输入处有一个有效的JSON对象流。示范让我们选择一个简单的故事，主人和他们的宠物。

模型和数据

基础关系：所有者

  id姓名年龄
 1 Adams 25 
 2 Baker 55 
 3克拉克40 
 4戴维斯31

基本关系： Pet <

  id姓名垃圾owner_id 
 10 Bella 4 1 
 20 Lucy 2 1 
 30 Daisy 3 2 
 40 Molly 4 3 
 50 Lola 2 4 
 60 Sadie 4 4 
 70 Luna 3 4

来源

从上面我们可以得到一个派生关系 Owner_Pet 以上关系的SQL JOIN）以JSON格式呈现给我们的jq查询（源数据）：

{owner_id：1，owner：Adams，age：25，pet_id：10，pet：Bella，litter：4} {owner_id ：1， owner：Adams，age：25，pet_id：20，pet：Lucy，litter：2} {owner_id：2， Baker，age：55，pet_id：30，pet：Daisy，litter：3} {owner_id：3，：40，pet_id：40，宠物：Molly，垃圾：4} {owner_id：4，所有者：戴维斯，年龄 pet_id：50，pet：Lola，litter：2} {owner_id：4，所有者：戴维斯，年龄：31，pet_id 宠物：Sadie，垃圾：4} {owner_id：4，所有者：戴维斯，年龄：31，pet_id：70， Luna，litter：3}

请求

下面是示例请求及其预期输出：

COUNT每个拥有者的宠物数量：

{owner_id：1，owner：Adams，age：25，pets_count： 2} {owner_id：2，owner：Baker，age：55，pets_count：1} {owner_id：3，owner：Clark ，age：40，pets_count：1} {owner_id：4，owner：戴维斯，年龄：31，pets_count：3}

为每个所有者和取得小孩的数量得到他们的MAX（MIN / AVG）：

{owner_id：1，owner：Adams，age：25，litter_total：6，litter_max：4}
{owner_id ：2，所有者：贝克，年龄：55，litter_total：3，litter_max：3}
{owner_id：3，所有者：克拉克：40，litter_total：4，litter_max：4}
{owner_id：4，owner：戴维斯，年龄：31，litter_total：9，litter_max ：4}

每位拥有者的ARRAY_AGG宠物：

$ b pre $ {owner_id：1，owner：Adams，age：25，pets ：[Bella，Lucy]} {owner_id：2，owner：Baker，age：55，pets：[Daisy]} {owner_id：3，owner：Clark，age：40，pets：[Molly]} {owner_id：4，owner：Davis， age：31，pets：[Lola，Sadie，Luna]}

<这是一个很好的练习，但是SO不是一个编程服务，所以我将重点介绍jq中通用解决方案的一些关键概念，这些概念是高效的，即使对于非常大的集合。

GROUPS_BY

效率的关键在于避免内置 group_by ，因为它需要排序。由于jq基本上是面向流的，因此以下定义 GROUPS_BY 同样也是面向流的。它利用基于键的查找的效率，同时避免在字符串上调用 tojson ：

＃发出由f 定义的组的流。def GROUPS_BY（stream; f）： def unwind： to_entries [] | .value | to_entries [] | .value; 将$ x（{}; （$ x | f）作为$ s |（$ s | type）减少为$ t |（如果$ t ==string，那么$ s else（$ s | tojson）结束）为$ y |。[$ t] [$ y] + = [$ x]） |放松;

distinct 和 count_distinct

＃在`stream`中发出不同实体的数组， b def distinct（stream）：将$ x（{}; （$ x | type）作为$ t |（如果$ t ==stringthen $ x else（$ x | tojson）end）as $ y | if（。[$ t] | has（$ y））then。else。[$ t] [$ y] + = [$ x] end ） | [。[] []] |添加; ＃发出给定流中不同项目的数量 def count_distinct（stream）： def sum（s）：reduce s as $ x（0 ; + $ X）; 将$ x（{}; （$ x | type）作为$ t |（如果$ t ==string，然后$ x else（$ x | tojson）结束）为$ y |。[$ t] [$ y] = 1） | sum（。[] []）;

方便功能

def owner：{owner_id，owner，age};

示例：COUNT每个所有者的宠物数量

GROUPS_BY（输入; .owner_id） | （。[0] |所有者）+ {pets_count：count_distinct（。[] | .pet_id）}
调用：jq -nc -f program1.jq input.json

输出：

{owner_id：1，owner：Adams，age：25，pets_count：2} {owner_id：2，owner：Baker ，age：55，pets_count：1} {owner_id：3，owner：Clark，age：40，pets_count：1} { owner_id：4，owner：Davis，age：31，pets_count：3}

示例：SUM计算每个所有者的whelps数量并获得它们的MAX

GROUPS_BY（inputs; .owner_id ） | （。[0] |所有者） + {litter_total :(地图（.litter）| add）} + {litter_max :(地图（.litter）| max）}
调用：jq -nc -f program2.jq input.json
输出：给出。
示例：ARRAY_AGG pets per owner GROUPS_BY（输入; .owner_id） | （。[0] |所有者）+ {pets：distinct（。[] | .pet）} 调用：jq -nc -f program3.jq input.json 输出： {owner_id：1，owner：Adams，age：25，pets：[Bella，Lucy]} {owner_id：2 ，owner：Baker，age：55，pets：[Daisy]} {owner_id：3，owner：Clark 宠物：[莫莉]] {owner_id：4，所有者：戴维斯，年龄：31，宠物：[Lola，Sadie， ]} Similar questions asked here before:Count items for a single key: jq count the number of items in json by a specific key Calculate the sum of object values: How do I sum the values in an array of maps in jq? Question How to emulate the COUNT aggregate function which should behave similarly to its SQL original? Let's extend this question even more to include other regular SQL functions: COUNT SUM / MAX/ MIN / AVG ARRAY_AGG The last one is not a standard SQL function - it's from PostgreSQL but is quite useful. At input comes a stream of valid JSON objects. For demonstration let's pick a simple story of owners and their pets. Model and data Base relation: Owner id name age 1 Adams 25 2 Baker 55 3 Clark 40 4 Davis 31 Base relation: Pet id name litter owner_id 10 Bella 4 1 20 Lucy 2 1 30 Daisy 3 2 40 Molly 4 3 50 Lola 2 4 60 Sadie 4 4 70 Luna 3 4 Source From above we get a derivative relation Owner_Pet (a result of SQL JOIN of the above relations) presented in JSON format for our jq queries (the source data): { "owner_id": 1, "owner": "Adams", "age": 25, "pet_id": 10, "pet": "Bella", "litter": 4 } { "owner_id": 1, "owner": "Adams", "age": 25, "pet_id": 20, "pet": "Lucy", "litter": 2 } { "owner_id": 2, "owner": "Baker", "age": 55, "pet_id": 30, "pet": "Daisy", "litter": 3 } { "owner_id": 3, "owner": "Clark", "age": 40, "pet_id": 40, "pet": "Molly", "litter": 4 } { "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 50, "pet": "Lola", "litter": 2 } { "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 60, "pet": "Sadie", "litter": 4 } { "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 70, "pet": "Luna", "litter": 3 } Requests Here are sample requests and their expected output: COUNT the number of pets per owner: { "owner_id": 1, "owner": "Adams", "age": 25, "pets_count": 2 } { "owner_id": 2, "owner": "Baker", "age": 55, "pets_count": 1 } { "owner_id": 3, "owner": "Clark", "age": 40, "pets_count": 1 } { "owner_id": 4, "owner": "Davis", "age": 31, "pets_count": 3 } SUM up the number of whelps per owner and get their MAX (MIN/AVG): { "owner_id": 1, "owner": "Adams", "age": 25, "litter_total": 6, "litter_max": 4 } { "owner_id": 2, "owner": "Baker", "age": 55, "litter_total": 3, "litter_max": 3 } { "owner_id": 3, "owner": "Clark", "age": 40, "litter_total": 4, "litter_max": 4 } { "owner_id": 4, "owner": "Davis", "age": 31, "litter_total": 9, "litter_max": 4 } ARRAY_AGG pets per owner: { "owner_id": 1, "owner": "Adams", "age": 25, "pets": [ "Bella", "Lucy" ] } { "owner_id": 2, "owner": "Baker", "age": 55, "pets": [ "Daisy" ] } { "owner_id": 3, "owner": "Clark", "age": 40, "pets": [ "Molly" ] } { "owner_id": 4, "owner": "Davis", "age": 31, "pets": [ "Lola", "Sadie", "Luna" ] } 解决方案 This is a nice exercise, but SO is not a programming service, so I will focus here on some key concepts for generic solutions in jq that are efficient, even for very large collections. GROUPS_BY The key to efficiency here is avoiding the built-in group_by, as it requires sorting. Since jq is fundamentally stream-oriented, the following definition of GROUPS_BY is likewise stream-oriented. It takes advantage of the efficiency of key-based lookups, while avoiding calling tojson on strings: # emit a stream of the groups defined by f def GROUPS_BY(stream; f): def unwind: to_entries[] | .value | to_entries[] | .value ; reduce stream as $x ({}; ($x|f) as $s | ($s|type) as $t | (if $t == "string" then $s else ($s|tojson) end) as $y | .[$t][$y] += [$x] ) | unwind ; distinct and count_distinct # Emit an array of the distinct entities in `stream`, without sorting def distinct(stream): reduce stream as $x ({}; ($x|type) as $t | (if $t == "string" then $x else ($x|tojson) end) as $y | if (.[$t] | has($y)) then . else .[$t][$y] += [$x] end ) | [.[][]] | add ; # Emit the number of distinct items in the given stream def count_distinct(stream): def sum(s): reduce s as $x (0;.+$x); reduce stream as $x ({}; ($x|type) as $t | (if $t == "string" then $x else ($x|tojson) end) as $y | .[$t][$y] = 1 ) | sum( .[][] ) ; Convenience function def owner: {owner_id,owner,age}; Example: "COUNT the number of pets per owner" GROUPS_BY(inputs; .owner_id) | (.[0] | owner) + {pets_count: count_distinct(.[]|.pet_id)} Invocation: jq -nc -f program1.jq input.json Output: {"owner_id":1,"owner":"Adams","age":25,"pets_count":2} {"owner_id":2,"owner":"Baker","age":55,"pets_count":1} {"owner_id":3,"owner":"Clark","age":40,"pets_count":1} {"owner_id":4,"owner":"Davis","age":31,"pets_count":3} Example: "SUM up the number of whelps per owner and get their MAX" GROUPS_BY(inputs; .owner_id) | (.[0] | owner) + {litter_total: (map(.litter) | add)} + {litter_max: (map(.litter) | max)} Invocation: jq -nc -f program2.jq input.json Output: as given. Example: "ARRAY_AGG pets per owner" GROUPS_BY(inputs; .owner_id) | (.[0] | owner) + {pets: distinct(.[]|.pet)} Invocation: jq -nc -f program3.jq input.json Output: {"owner_id":1,"owner":"Adams","age":25,"pets":["Bella","Lucy"]} {"owner_id":2,"owner":"Baker","age":55,"pets":["Daisy"]} {"owner_id":3,"owner":"Clark","age":40,"pets":["Molly"]} {"owner_id":4,"owner":"Davis","age":31,"pets":["Lola","Sadie","Luna"]} 这篇关于jq中的SQL风格GROUP BY聚合函数（COUNT，SUM等）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

jq中的SQL风格GROUP BY聚合函数（COUNT，SUM等） [英] SQL-style GROUP BY aggregate functions in jq (COUNT, SUM and etc)

问题描述

问题

模型和数据

来源

请求

GROUPS_BY

`distinct` 和 `count_distinct`

方便功能

示例：COUNT每个所有者的宠物数量

示例：SUM计算每个所有者的whelps数量并获得它们的MAX

示例：ARRAY_AGG pets per owner

Question

Model and data

Source

Requests

GROUPS_BY

`distinct` and `count_distinct`

Convenience function

Example: "COUNT the number of pets per owner"

Example: "SUM up the number of whelps per owner and get their MAX"

Example: "ARRAY_AGG pets per owner"

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

jq中的SQL风格GROUP BY聚合函数（COUNT，SUM等） [英] SQL-style GROUP BY aggregate functions in jq (COUNT, SUM and etc)

问题描述

问题

模型和数据

来源

请求

GROUPS_BY

distinct 和 count_distinct

方便功能

示例：COUNT每个所有者的宠物数量

示例：SUM计算每个所有者的whelps数量并获得它们的MAX

示例：ARRAY_AGG pets per owner

Question

Model and data

Source

Requests

GROUPS_BY

distinct and count_distinct

Convenience function

Example: "COUNT the number of pets per owner"

Example: "SUM up the number of whelps per owner and get their MAX"

Example: "ARRAY_AGG pets per owner"

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

`distinct` 和 `count_distinct`

`distinct` and `count_distinct`

登录关闭