如何在DynamoDB中按项目的任何属性实现排序(How to achieve sorting by any attribute of an item in DynamoDB)

7 IT屋

I have a DynamoDB structure as following.

  1. I have patients with patient information stored in its documents.
  2. I have claims with claim information stored in its documents.
  3. I have payments with payment information stored in its documents.
  4. Every claim belongs to a patient. A patient can have one or more claims.
  5. Every payment belongs to a patient. A patient can have one or more payments.

I created only one DynamoDB table since all of aws dynamodb documentations indicates using only one table if possible is the best solution. So I end up with following : BASE TABLE

In this table ID is the partition key and EntryType is the sortkey. Every claim and payment holds its owner. My access patterns are as following :

  1. Listing all patients in the DB with pagination with patients sorted on creation dates.
  2. Listing all claims in the DB with pagination with claims sorted on creation dates.
  3. Listing all payments in the DB with pagination with payments sorted on creation dates.
  4. Listing claims of a particular patient.
  5. Listing payments of a particular patient.

I can achieve these with two global secondary indexes. I can list patients, claims and payments sorted by their creation date by using a GSI with EntryType as a partition key and CreationDate as a sort key. Also I can list a patient's claims and payments by using another GSI with EntryType partition key and OwnerID sort key.

My problem is this approach brings me only sorting with creation date. My patients and claims have much more attributes (around 25 each) and I need to sort them according to each of their attribute as well. But there is a limit on Amazon DynamoDB that every table can have at most 20 GSI. So I tried creating GSI's on the fly (dynamically upon the request) but that also ended very inefficiently since it copies the items to another partition to create a GSI (as far as I know). So what is the best solution to sort patients by their patient name, claims by their claim description and any other fields they have?

解决方案

Sorting in DynamoDB happens only on the sort key. In your data model, your sort key is EntryType, which doesn't support any of the access patterns you've outlined.

You could create a secondary index on the fields you want to sort by (e.g. creationDate). However, that pattern can be limiting if you want to support sorting by many attributes.

I'm afraid there is no simple solution to your problem. While this is super simple in SQL, DynamoDB sorting just doens't work that way. Instead, I'll suggest a few ideas that may help get you unstuck:

  • Client Side Sorting - Use DDB to efficiently query the data your application needs, and let the client worry about sorting the data. For example, if your client is a web application, you could use javascript to dynamically sort the fields on the fly, depending on which field the user wants to sort by.
  • Consider using KSUIDs for your IDs - I noticed most of your access patterns involves sorting by CreationDate. The KSUID, or K-Sortable Globally Unique Id's, is a globally unique ID that is sortable by generation time. It's a great option when your application needs to create unique IDs and sort by a creation timestamp. If you build a KSUID into your sort keys, your query results could automatically support sorting by creation date.
  • Reorganize Your Data - If you have the flexibility to redesign how you store your data, you could accommodate several of your access patterns with fewer secondary indexes (example below).

Finally, I notice that your table example is very "flat" and doesn't appear to be modeling the relationships in a way that supports any of your access patterns (without adding indexes). Perhaps it's just an example data set to highlight your question about sorting, but I wanted to address a different way to model your data in the event you are unfamiliar with these patterns.

For example, consider your access patterns that require you to fetch a patient's claims and payments, sorted by creation date. Here's one way that could be modeled:

Patient relationships

This design handles four access patterns:

  1. get patient claims, sorted by date created.
  2. get patient payments, sorted by date created.
  3. get patient info (names, etc...)
  4. get patient claims, payments and info (in a single query).

The queries would look like this (in pseudocode):

  1. query where PK = "PATIENT#UUID1" and SK < "PATIENT#UUID1"
  2. query where PK = "PATIENT#UUID1" and SK > "PATIENT#UUID1"
  3. query where PK = "PATIENT#UUID1" and SK = "PATIENT#UUID1"
  4. query where PK = "PATIENT#UUID1"

These queries take advantage of the sort keys being lexicographically sorted. When you ask DDB to fetch the PATIENT#UUID1 partition with a sort key less than "PATIENT#UUID1", it will return only the CLAIM items. This is because CLAIMS comes before PATIENT when sorted alphabetically. The same pattern is how I access the PAYMENT items for the given patient. I've used KSUIDs in this scenario, which gives you the added feature of having the CLAIMS and PAYMENT items sorted by creation date!

While this pattern may not solve all of your sorting problems, I hope it gives you some ideas of how you can model your data to support a variety of access patterns with sorting functionality as a side effect.

我具有以下DynamoDB结构.

  1. 我有患者,患者的信息存储在其文档中.
  2. 我有一些索赔,索赔信息存储在其文档中.
  3. 我在付款时将付款信息存储在其文档中.
  4. 每项索赔均属于患者.一个病人可以有一个或多个要求.
  5. 每笔付款都属于一个患者.病人可以进行一次或多次付款.

我只创建了一个DynamoDB表,因为所有aws dynamodb文档都指出,如果可能的话,最好仅使用一个表是最佳解决方案.所以我最终得到以下结果:

在此表中,ID是分区键,EntryType是排序键.每个索偿和付款都由其所有者承担.我的访问模式如下:

  1. 列出数据库中所有具有分页时间的患者,并按创建日期对患者进行分页.
  2. 通过分页列出数据库中的所有声明,并按创建日期对声明进行排序.
  3. 列出数据库中的所有付款,并分页列出按创建日期排序的付款.
  4. 列出特定患者的要求.
  5. 列出特定患者的付款.

我可以使用两个全局二级索引来实现这些目标.我可以通过使用GSI列出患者,索赔和付款,并按其创建日期进行排序,将EntryType作为分区键,将CreationDate作为排序键.我还可以使用另一个具有EntryType分区键和OwnerID排序键的GSI来列出患者的索赔和付款.

我的问题是这种方法仅给我带来创建日期的排序.我的患者和主张具有更多属性(每个属性约25个),我也需要根据它们的每个属性对其进行排序.但是Amazon DynamoDB上有一个限制,即每个表最多可以有20个GSI.因此,我尝试动态创建GSI(根据请求动态创建),但是由于将项目复制到另一个分区以创建GSI(据我所知),这样做的效率也很低.那么,最好的解决方案是按患者姓名,权利要求描述以及其他任何字段对患者进行排序?

解决方案

DynamoDB中的排序仅在排序键上进行.在数据模型中,您的排序键是 EntryType ,它不支持您概述的任何访问模式.

您可以在要排序的字段上创建二级索引(例如 creationDate ).但是,如果您要支持按许多属性进行排序,则该模式可能会受到限制.

恐怕没有简单的解决方案可以解决您的问题.虽然在SQL中 super 很简单,但DynamoDB排序却无法做到这一点.相反,我会提出一些可能有助于您解决问题的想法:

  • 客户端排序-使用DDB可以高效地查询您的应用程序所需的数据,并使客户端不必担心对数据进行排序.例如,如果您的客户端是一个Web应用程序,则可以使用javascript来动态地对字段进行动态排序,具体取决于用户希望对哪个字段进行排序.
  • 考虑使用 KSUID 作为您的ID -我注意到您的大多数访问模式涉及按 CreationDate 进行排序.KSUID或K-Sortable全球唯一ID是一个全球唯一ID,可以按生成时间排序.当您的应用程序需要按创建时间戳创建唯一的ID 排序时,这是一个不错的选择.如果您在排序键中内置了KSUID,则查询结果将自动支持按创建日期排序.
  • 重新组织数据-如果您可以灵活地重新设计存储数据的方式,则可以用较少的二级索引来容纳几种访问模式(以下示例).

最后,我注意到您的表格示例非常"扁平".并且似乎并没有以支持您的任何访问模式(无需添加索引)的方式对关系进行建模.也许这只是一个示例数据集,用于突出显示您有关排序的问题,但是如果您不熟悉这些模式,我想提出一种不同的方法来对数据建模.

例如,考虑您的访问方式,该方式要求您提取患者的索赔和付款,并按创建日期排序.这是一种可以建模的方式:

患者关系

此设计处理四种访问模式:

  1. 获取患者索赔,按创建日期排序.
  2. 获取患者付款,按创建日期排序.
  3. 获取患者信息(姓名等)
  4. (在单个查询中)获取患者的理赔,付款和信息.

查询看起来像这样(用伪代码):

  1. 查询,其中PK ="PATIENT#UUID1";和SK<"PATIENT#UUID1"
  2. 查询,其中PK ="PATIENT#UUID1";和SK>"PATIENT#UUID1"
  3. 查询,其中PK ="PATIENT#UUID1";并且SK ="PATIENT#UUID1"
  4. 查询,其中PK ="PATIENT#UUID1"

这些查询利用了按字典顺序排序的排序键.当您要求DDB使用小于 "PATIENT#UUID1"的排序键来获取PATIENT#UUID1分区时,它将仅返回 CLAIM 项目.这是因为 CLAIMS 按字母顺序排在 PATIENT 之前.我访问给定患者的 PAYMENT 项目的方式相同.我在这种情况下使用了KSUID,它为您提供了附加功能,即可以按创建日期对 CLAIMS 和 PAYMENT 项目进行排序!

尽管这种模式可能无法解决您所有的排序问题,但我希望它为您提供一些有关如何建模数据以支持各种访问模式以及排序功能的副作用的想法.

本文地址:IT屋 » 如何在DynamoDB中按项目的任何属性实现排序