CouchDB视图-在键数组上进行筛选和分组 [英] CouchDB View - Filter and Group By on Key Array

查看:92
本文介绍了CouchDB视图-在键数组上进行筛选和分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在CouchDB视图中有一组键, [doc.time,doc.address] 。两者都不是唯一的。 doc.time 是UNIX时间戳,而 doc.address 是字符串。 reduce 函数设置为 _sum ,因为每组键的唯一值是数字。

I have an array of keys in a CouchDB view, [doc.time, doc.address]. Neither is unique. doc.time is a UNIX timestamp and doc.address is a string. The reduce function is set to _sum as the only value for each set of keys is a number.

我想要的是按 doc.time 过滤,然后将其余记录按 doc.address分组。如果我将 doc.time 作为第一个键,则无论我指定为 group_level 。如果我先放置 doc.address ,我似乎无法按时间过滤查询。

What I want is to filter by doc.time, then group the remaining records by doc.address. If I put doc.time as the first key, I cannot seem to group by unique addresses no matter what I specify as a group_level. If I put doc.address first, I cannot seem to filter the query by time.

查询:?group_level = 1& startkey = [0,1230000000]& endkey = [{},1340000000]

第一个关键字: doc.address doc.time

问题:不能按时间过滤

代码:

rows: [
  {
    key: [ "1126GDuGLQTX3LFHHmjCctdn8WKDjn7QNA" ],
    value: 50
  },
  {
    key: [ "112AobLhjLJQ3LGqXFrsdnWMPqWCQqoiS6" ],
    value: 50
  }
]






查询:?group_level = 1& startkey = [1230000000]& endkey = [1340000000,{}]

第一个键: doc.time doc.address之前

问题:看不到,我没有按 doc.address分组

Problem: Cannot see and I am not grouped by doc.address

代码:

rows: [
  {
    key: [ 1231469665 ],
    value: 50
  },
  {
    key: [ 1231469744 ],
    value: 50
  }
]


推荐答案

您提到:


...如果我把 doc.time 作为第一个键,我似乎无法按唯一地址分组无论我指定为group_level ...

... If I put doc.time as the first key, I cannot seem to group by unique addresses no matter what I specify as a group_level ...

查询参数 group_level = N N th 逗号上分割字符串,并通过字符串匹配将左元素分组在一起。因此,当您的 array键如下所示: [doc.time,doc.address] 时,您将无法按地址,该地址不在逗号的左侧。

The query parameter group_level=N splits the string on the Nth comma and groups the left elements together by string match. Therefore, When your array key is like this: [doc.time, doc.address], you won't be able to group by address, which is not on the left side of the comma.


...如果我先放置 doc.address ,我似乎无法按时间过滤查询...

... If I put doc.address first, I cannot seem to filter the query by time ...

当您的数组键类似于: [doc.address,doc.time] 时,请注意您正在发出<地图函数中的em>数组键。您需要考虑CouchDB中关于数组键复合键后续点

When your array key is like: [doc.address, doc.time], notice that you are emitting an array key inside your Map function. You need to consider the following points regarding array key or compound key in CouchDB:

此引用


...首先要注意的事项和非常非常重要...来自javascript Map函数的数组输出...这些索引键中的每个都是字符串,并按字符串作为字符串(包括方括号和逗号)按字符排序...

... First thing of note and very important ... an array output ... from the javascript Map function ... each of those Index Keys are strings, and are ordered character by character as strings, including the brackets and commas ...

上述引用对CA中CouchDB索引的工作方式有重大影响复合键数组键的本身。

The above statement and explanations on the reference have a significant impact on how CouchDB indexing works in the case of compound key or array key.

为澄清起见,让我们在<$ c上创建如下所示的文档$ c>样本数据库:

To clarify, lets create documents like below on a sample database:

{"time":"2011","address":"CT"}
{"time":"2012","address":"CT"}
...
{"time":"2011","address":"TX"}
...
{"time":"2015","address":"TX"}
...
{"time":"2014","address":"NY"}
...
{"time":"2014","address":"CA"}
{"time":"2015","address":"CA"}
{"time":"2016","address":"CA"}

我实现了这样的视图地图功能:

I implemented a view map function like this:

function (doc) {
  if(doc.time && doc.address){
    emit([doc.address, doc.time], null);
  }
}

目前,我没有使用任何 Reduce 函数,因为可以忽略任何分组 reducing ,而将注意力放在简单的简单索引上。上面的视图正在生成以下用于索引的键/值对:

For now, I'm not using any Reduce function, because, lets ignore any grouping or reducing and focus on plain simple indexing. The above view is generating the following key/value pairs for indexing:

$ curl -k -X GET 'https://admin:****@192.168.1.106:6984/sample/_design/by_addr_time/_view/by_addr_time'
{"total_rows":25,"offset":0,"rows":[
{"id":"doc_0022","key":["CA","2014"],"value":null},
{"id":"doc_0023","key":["CA","2015"],"value":null},
{"id":"doc_0024","key":["CA","2016"],"value":null},
{"id":"doc_0000","key":["CT","2011"],"value":null},
{"id":"doc_0001","key":["CT","2012"],"value":null},
{"id":"doc_0002","key":["CT","2013"],"value":null},
{"id":"doc_0003","key":["CT","2014"],"value":null},
{"id":"doc_0004","key":["CT","2015"],"value":null},
{"id":"doc_0005","key":["CT","2016"],"value":null},
{"id":"doc_0014","key":["NY","2011"],"value":null},
{"id":"doc_0015","key":["NY","2012"],"value":null},
{"id":"doc_0016","key":["NY","2013"],"value":null},
{"id":"doc_0017","key":["NY","2014"],"value":null},
{"id":"doc_0018","key":["NY","2015"],"value":null},
{"id":"doc_0019","key":["NY","2016"],"value":null},
{"id":"doc_0020","key":["NY","2017"],"value":null},
{"id":"doc_0021","key":["NY","2018"],"value":null},
{"id":"doc_0006","key":["TX","2011"],"value":null},
{"id":"doc_0008","key":["TX","2012"],"value":null},
{"id":"doc_0007","key":["TX","2013"],"value":null},
{"id":"doc_0009","key":["TX","2014"],"value":null},
{"id":"doc_0010","key":["TX","2015"],"value":null},
{"id":"doc_0011","key":["TX","2016"],"value":null},
{"id":"doc_0012","key":["TX","2017"],"value":null},
{"id":"doc_0013","key":["TX","2018"],"value":null}
]}

现在,我要执行一个查询,以按<$ c过滤视图$ c> doc.time 。我的查询参数是:

Now, I'm going to do a query to filter the view by doc.time. My query parameters are:

?startkey=["AA","2017"]&endkey=["ZZ","2018"]

我希望上述查询仅返回时间字段,位于 2017 2018 之间,地址<这些文档的/ code>字段可以具有 any 值,因为我是从 AA ZZ ,其中包括我数据库中的所有地址。我正在使用 curl 这样的查询:

I expect the above query to return only the docs with the time field between 2017 and 2018, the address field of those docs can have any value since I specified from AA to ZZ which includes all addresses on my database. I'm doing the query with curl like this:

$ curl -k -X GET 'https://admin:****@192.168.1.106:6984/sample/_design/by_addr_time/_view/by_addr_time?startkey=\["AA","2017"\]&endkey=\["ZZ","2018"\]'
{"total_rows":25,"offset":0,"rows":[
{"id":"doc_0022","key":["CA","2014"],"value":null},
{"id":"doc_0023","key":["CA","2015"],"value":null},
{"id":"doc_0024","key":["CA","2016"],"value":null},
{"id":"doc_0000","key":["CT","2011"],"value":null},
{"id":"doc_0001","key":["CT","2012"],"value":null},
{"id":"doc_0002","key":["CT","2013"],"value":null},
{"id":"doc_0003","key":["CT","2014"],"value":null},
{"id":"doc_0004","key":["CT","2015"],"value":null},
{"id":"doc_0005","key":["CT","2016"],"value":null},
{"id":"doc_0014","key":["NY","2011"],"value":null},
{"id":"doc_0015","key":["NY","2012"],"value":null},
{"id":"doc_0016","key":["NY","2013"],"value":null},
{"id":"doc_0017","key":["NY","2014"],"value":null},
{"id":"doc_0018","key":["NY","2015"],"value":null},
{"id":"doc_0019","key":["NY","2016"],"value":null},
{"id":"doc_0020","key":["NY","2017"],"value":null},
{"id":"doc_0021","key":["NY","2018"],"value":null},
{"id":"doc_0006","key":["TX","2011"],"value":null},
{"id":"doc_0008","key":["TX","2012"],"value":null},
{"id":"doc_0007","key":["TX","2013"],"value":null},
{"id":"doc_0009","key":["TX","2014"],"value":null},
{"id":"doc_0010","key":["TX","2015"],"value":null},
{"id":"doc_0011","key":["TX","2016"],"value":null},
{"id":"doc_0012","key":["TX","2017"],"value":null},
{"id":"doc_0013","key":["TX","2018"],"value":null}
]}

上述查询返回的响应似乎令人震惊。因为看起来它不只返回在 2017 2018之间提交的时间的文档。这就是数组键的CouchDB索引的工作方式。 CouchDB对数组键进行索引,就好像整个数组是一个包含该数组的括号和逗号的字符串一样!如果您阅读参考,它将开始使

The response returned by the above query seems shocking. Because it looks like it did NOT return only the docs with time filed between 2017 and 2018. That's just how the CouchDB indexing for array keys work. CouchDB does the indexing of array keys as if the whole array is a string including the brackets and commas of the array! If you read the reference, it would start to make sense.

现在让我们更改查询:

?startkey=["CT","2016"]&endkey=["TX","2011"]

根据我们的解释,上述查询的结果如下所示:

The result of the above query is shown below, based on our explanations, this should make sense:

$ curl -k -X GET 'https://admin:****@192.168.1.106:6984/sample/_design/by_addr_time/_view/by_addr_time?startkey=\["CT","2016"\]&endkey=\["TX","2011"\]'
{"total_rows":25,"offset":8,"rows":[
{"id":"doc_0005","key":["CT","2016"],"value":null},
{"id":"doc_0014","key":["NY","2011"],"value":null},
{"id":"doc_0015","key":["NY","2012"],"value":null},
{"id":"doc_0016","key":["NY","2013"],"value":null},
{"id":"doc_0017","key":["NY","2014"],"value":null},
{"id":"doc_0018","key":["NY","2015"],"value":null},
{"id":"doc_0019","key":["NY","2016"],"value":null},
{"id":"doc_0020","key":["NY","2017"],"value":null},
{"id":"doc_0021","key":["NY","2018"],"value":null},
{"id":"doc_0006","key":["TX","2011"],"value":null}
]}






UPDATE




UPDATE


...我要过滤的是 doc.time ,然后将其余记录按 doc.address ...

... What I want is to filter by doc.time, then group the remaining records by doc.address ...


$ b分组$ b

那么,我们该怎么办? 问题和答案,并提供了基本思路。

So, what should we do? There is a good question and answer and provides the basic ideas.

不确定哪个主意最好,但是我实现了这样一个主意:创建了一个名为 t_red 如下所示,内置 _count reduce:

I'm not sure which idea is the best, but I implemented one idea like this: created a view named t_red like below with a builtin _count reduce:

function (doc) {
  if(doc.time && doc.address){
    emit([doc.time, doc.address], null);
  }
}

我还创建了一个名为<$ c $的视图c> a_red 具有内置 _count 减少:

Also, I created a view named a_red with a builtin _count reduce:

function (doc) {
  if(doc.address && doc.time){
    emit([doc.address, doc.time], null);
  }
}

然后我在 NodeJS上开发了以下代码查询在 2012 2015 doc.time c>,然后根据 doc.address 将结果分组,控制台日志将在代码内显示为注释。我希望这段代码会有所帮助(不要混淆!):

Then I developed the following code on NodeJS to query doc.time between 2012 and 2015 and then group the results according to the doc.address, console logs are shown inside the code as comments. I hope this code will be helpful (not confusing!):

process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0"; // Ignore rejection, becasue CouchDB SSL certificate is self-signed

const fetch=require('node-fetch')

// query "t_red" view/index
fetch(`https://admin:****@192.168.1.106:6984/sample/_design/t_red/_view/t_red?group_level=2&startkey=["2012", "AA"]&endkey=["2015", "ZZ"]`, {
    method: 'GET',
    headers: {
        'Content-Type': 'application/json',
    }
}).then(
    res=>res.json()
).then(data=>{
    let unique_addr=[]
    data.rows.map(row=>{
        console.log('row.key-> ', row.key, '  row.value-> ', row.value)

        // console log is shown below:
        //
        // row.key->  [ '2012', 'CT' ]   row.value->  1
        // row.key->  [ '2012', 'NY' ]   row.value->  1
        // row.key->  [ '2012', 'TX' ]   row.value->  1
        // row.key->  [ '2013', 'CT' ]   row.value->  1
        // row.key->  [ '2013', 'NY' ]   row.value->  1
        // row.key->  [ '2013', 'TX' ]   row.value->  1
        // row.key->  [ '2014', 'CA' ]   row.value->  1
        // row.key->  [ '2014', 'CT' ]   row.value->  1
        // row.key->  [ '2014', 'NY' ]   row.value->  1
        // row.key->  [ '2014', 'TX' ]   row.value->  1
        // row.key->  [ '2015', 'CA' ]   row.value->  1
        // row.key->  [ '2015', 'CT' ]   row.value->  1
        // row.key->  [ '2015', 'NY' ]   row.value->  1
        // row.key->  [ '2015', 'TX' ]   row.value->  1

        if(unique_addr.indexOf(row.key[1])==-1){ // Push unique addresses into an array
            unique_addr.push(row.key[1])
        }
    })

    console.log(unique_addr)

    // Console log is shown below:
    //
    // [ 'CT', 'NY', 'TX', 'CA' ]

    return unique_addr

}).then(unique_addr=>{

    // Group the unique addresses
    let group_by_address=unique_addr.map(addr=>{
        // For each unique address, do a query of "a_red" view/index
        return fetch(`https://admin:****@192.168.1.106:6984/sample/_design/a_red/_view/a_red?group_level=2&startkey=["${addr}","2012"]&endkey=["${addr}","2015"]`, {
            method: 'GET',
            headers: {
                'Content-Type': 'application/json',
            }
        }).then(
            res=>res.json()
        ).then(data=>{
            data.rows.map(row=>{console.log('row.key-> ', row.key, '  row.value-> ', row.value)})

            // Console logs related to this section of code are shown below

            //row.key->  [ 'CA', '2014' ]   row.value->  1
            //row.key->  [ 'CA', '2015' ]   row.value->  1

            //row.key->  [ 'NY', '2012' ]   row.value->  1
            //row.key->  [ 'NY', '2013' ]   row.value->  1
            //row.key->  [ 'NY', '2014' ]   row.value->  1
            //row.key->  [ 'NY', '2015' ]   row.value->  1

            //row.key->  [ 'CT', '2012' ]   row.value->  1
            //row.key->  [ 'CT', '2013' ]   row.value->  1
            //row.key->  [ 'CT', '2014' ]   row.value->  1
            //row.key->  [ 'CT', '2015' ]   row.value->  1

            //row.key->  [ 'TX', '2012' ]   row.value->  1
            //row.key->  [ 'TX', '2013' ]   row.value->  1
            //row.key->  [ 'TX', '2014' ]   row.value->  1
            //row.key->  [ 'TX', '2015' ]   row.value->  1

            let obj={}
            obj[addr]=data.rows.length // This object contains unique address and its corresponding frequency in above query
            return obj

        }).catch(err=>{
            console.log('err-> ', err)
        })
    })

    return group_by_address

}).then(group_by_address=>{
    group_by_address.map(group=>{
        group.then(()=>{
            console.log('Grouped by address-> ', group)

            // Console logs related this section of code are shown below:

            //Grouped by address->  Promise { { CA: 2 } }

            //Grouped by address->  Promise { { NY: 4 } }

            //Grouped by address->  Promise { { CT: 4 } }

            //Grouped by address->  Promise { { TX: 4 } }
        })
    })
}).catch(err=>{
    console.log('err-> ', err)
})

这篇关于CouchDB视图-在键数组上进行筛选和分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆