如何在弹性搜索(aws)中存储日期范围数据并搜索范围? [英] How to store date range data in elastic search (aws) and search for a range?

查看:60
本文介绍了如何在弹性搜索(aws)中存储日期范围数据并搜索范围?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图将酒店房间的可用性存储在elasticsearch中.然后我需要搜索室从一个日期到另一个日期都可以使用.我想出了两种存储数据以提高可用性的方法,如下所示:

I am trying to store hotel room availability in elasticsearch. And then I need to search rooms those are available from a date till another date. I have come up with two ways to store data for availability, and they are as follows:

此处可用性字典存储所有日期,并且每个日期键的值是true或false,表示其可用日期那天还是没那天.

Here availability dictionary store all dates and value of each date key is true of false, representing its available on that day or not.

{
  "_id": "khg2uo47tyhgjwebu7624787",
  "room_type": "garden view",
  "hotel_name": "Cool hotel",
  "hotel_id": "jytu64r982u0299023",
  "room_metadata1": 233,
  "room_color": "black",
  "availability": {
    "2016-07-01": true,
    "2016-07-02": true,
    "2016-07-03": false,
    "2016-07-04": true,
    "2016-07-05": true,
    "2016-07-06": null,
    "2016-07-07": true,
    "2016-07-08": true,
    ----
    ----
    for 365 days 
  }

}

可用性列表仅在有空位时存储这些日期

Here availability array only stores those dates when room is available

{
  "_id": "khg2uo47tyhgjwebu7624787",
  "room_type": "garden view",
  "hotel_name": "Cool hotel",
  "hotel_id": "jytu64r982u0299023",
  "room_metadata1": 535,
  "room_color": "black",
  "availability": ["2016-07-01", "2016-07-02", "2016-07-04", "2016-07-05", "2016-07-07", "2016-07-08"] ---for 365 days
  }
}

我想搜索所有房间,这些房间从 from_date to_date 都可用,并且应该查看 availability 字典或数组.日期范围可能长达365天

I want to search all rooms, those are available from from_date till to_date and that should look into availability dictionary or array.And my date range may span up to 365 days

如何存储这些可用性数据,以便我可以轻松执行上述搜索?而且我找不到任何搜索日期范围的方法,那么有什么建议吗?

How to store these availability data, so that I can perform the above search easily? And I could not find any way to search through range of dates, so any suggestion?

请注意,项目 availability 中的内容可能无法保持排序.我可能有超过1亿条记录可供搜索.

Please note, items in availability may not be kept sorted. And I may have more than 100 million records to search through.

推荐答案

对此建模的一种方法是使用父/子文档.房间文件将是父文件,可用性文件将是其子文件.对于每个房间,每个房间可用的日期都会有一个可用性文件.然后,在查询时,我们可以查询在搜索间隔中的每个日期(甚至是不相交的)中每个日期都有一个可用性子文档的父级房间.

One way to model this would be with parent/child documents. Room documents would be parent documents and availability documents would be their child documents. For each room, there would be one availability document per date the room is available. Then, at query time, we can query for parent rooms which have one availability child document for each date in the searched interval (even disjoint ones).

请注意,您需要确保在预订房间后立即删除每个预订日期的相应子文档.

Note that you'll need to make sure that as soon as a room is booked, you remove the corresponding child documents for each booked date.

让我们尝试一下.首先创建索引:

Let's try this out. First create the index:

PUT /rooms
{
  "mappings": {
    "room": {
      "properties": {
        "room_num": {
          "type": "integer"
        }
      }
    },
    "availability": {
      "_parent": {
        "type": "room"
      },
      "properties": {
        "date": {
          "type": "date",
          "format": "date"
        },
        "available": {
          "type": "boolean"
        }
      }
    }
  }
}

然后添加一些数据

POST /rooms/_bulk
{"_index": { "_type": "room", "_id": 233}}
{"room_num": 233}
{"_index": { "_type": "availability", "_id": "20160701", "_parent": 233}}
{"date": "2016-07-01"}
{"_index": { "_type": "availability", "_id": "20160702", "_parent": 233}}
{"date": "2016-07-02"}
{"_index": { "_type": "availability", "_id": "20160704", "_parent": 233}}
{"date": "2016-07-04"}
{"_index": { "_type": "availability", "_id": "20160705", "_parent": 233}}
{"date": "2016-07-05"}
{"_index": { "_type": "availability", "_id": "20160707", "_parent": 233}}
{"date": "2016-07-07"}
{"_index": { "_type": "availability", "_id": "20160708", "_parent": 233}}
{"date": "2016-07-08"}

最后,我们可以开始查询了.首先,假设我们要查找一个在 2016-07-01 :

Finally, we can start querying. First, let's say we want to find a room that is available on 2016-07-01:

POST /rooms/room/_search
{
  "query": {
    "has_child": {
      "type": "availability",
      "query": {
        "term": {
          "date": "2016-07-01"
        }
      }
    }
  }
}
=> result: room 233

然后,让我们尝试搜索从 2016-07-01 2016-07-03

Then, let's try searching for a room available from 2016-07-01 to 2016-07-03

POST /rooms/room/_search
{
  "query": {
    "bool": {
      "minimum_should_match": 3,
      "should": [
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-01"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-02"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-03"
              }
            }
          }
        }
      ]
    }
  }
}
=> Result: No rooms

但是,搜索从 2016-07-01 2016-07-02 可用的房间确实会产生房间233

However, searching for a room available from 2016-07-01 to 2016-07-02 does yield room 233

POST /rooms/room/_search
{
  "query": {
    "bool": {
      "minimum_should_match": 2,
      "should": [
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-01"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-02"
              }
            }
          }
        }
      ]
    }
  }
}
=> Result: Room 233

我们还可以搜索不相交的间隔,例如从 2016-07-01 2016-07-02 +从 2016-07-04 2016-07-05

We can also search for disjoint intervals, say from 2016-07-01 to 2016-07-02 + from 2016-07-04 to 2016-07-05

POST /rooms/room/_search
{
  "query": {
    "bool": {
      "minimum_should_match": 4,
      "should": [
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-01"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-02"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-04"
              }
            }
          }
        },
        {
          "has_child": {
            "type": "availability",
            "query": {
              "term": {
                "date": "2016-07-05"
              }
            }
          }
        }
      ]
    }
  }
}
=> Result: Room 233

以此类推...关键是要在每个需要检查可用性的日期添加一个 has_child 查询,并将 minimum_should_match 设置为您要查询的日期数重新检查.

And so on... The key point is to add one has_child query per date you need to check availability for and set minimum_should_match to the number of dates you're checking.

更新

另一种选择是使用 script 过滤器,但是有1亿个文档,我不确定它能否很好地扩展.

Another option would be to use a script filter, but with 100 million documents, I'm not certain it would scale that well.

在这种情况下,您可以保留原始设计(最好是第二个设计,因为使用第一个设计,您将在映射中创建太多不必要的字段),查询将如下所示:

In this scenario you can keep your original design (preferably the second one, because with the first one, you'll create too many unnecessary fields in your mapping) and the query would look like this:

POST /rooms/room/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "inline": "def dates = doc.availability.sort(false); from = Date.parse('yyyy-MM-dd', from); to = Date.parse('yyyy-MM-dd', to); def days = to - from; def fromIndex = doc.availability.values.indexOf(from.time); def toIndex = doc.availability.values.indexOf(to.time); return days == (toIndex - fromIndex)",
            "params": {
              "from": "2016-07-01",
              "to": "2016-07-04"
            }
          }
        }
      }
    }
  }
}

这篇关于如何在弹性搜索(aws)中存储日期范围数据并搜索范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆